2024 Big Data Bowl: Exploratory Combine Data Analysis

I'm beginning to do some exploratory data analysis with 2024 Big Data Bowl (BDB) data. This year's theme is tackling; creating metrics or visualizations that will help explain tackling in the NFL.

One early thought I had was if a defensive player's arm length, hand size, and bench press (bench press is an indicator of grip strength) numbers have any correlation with better tackling percentages. I haven't figured out a good way yet to measure the effectiveness of a tackler, so I just did a basic "(opportunities - missed tackles)/opportunities" percentage for a simple measure of a tackler's effectiveness.

I pulled the tackle data from the Big Data Bowl website and got combine data for the 2017-2022 NFL Drafts from Steelers Depot, one of the most comprehensive NFL Combine data sources. We're allowed to use any publicly available dataset so these (somewhat incomplete) Combine results would have to do.

I needed to do some data cleaning on the Combine data to standardize it and make things like arm length (32 3/8") plottable (32.375). Then I paired it by player name and linked it to the tackle data provided by BDB. The heatmap above was simply created by uploading a spreadsheet into GPT-4's Advanced Data Analysis tool and asking for a correlation matrix visualization. 

The quick answer to my question is below:

  • No physical measurement from Combine data is highly correlated with the tackle made percentage, although arm length and vertical jump are the highest at 0.03. They're all basically 0.

_______________

Then I asked GPT-4 for the top 20 correlations and what they mean, sorted by absolute value (so highly positive or negative correlations are ranked high), the info below is copied and pasted from it. There were some interesting (and obvious) pairs that emerged:

The top 20 correlation pairs, ranked by absolute value, are as follows:

1. **10 YD and 40 YD**: Correlation of 0.92 indicates a very strong positive relationship. Faster 10-yard dash times likely mean faster 40-yard dash times.

2. **SS and 3C**: Correlation of 0.89 indicates a strong positive relationship. Better scores in the Short Shuttle likely mean better scores in the 3-Cone Drill.

3. **Weight and 40 YD**: Correlation of 0.85 suggests that as weight increases, the 40-yard dash time also likely increases.

4. **BJ and 40 YD**: Correlation of -0.82 indicates a strong negative relationship. Better Broad Jump scores likely mean faster 40-yard dash times.

5. **10 YD and BJ**: Correlation of -0.82 shows that better 10-yard dash times likely mean better Broad Jump scores.

6. **40 YD and 3C**: Correlation of 0.81 indicates a strong positive relationship. Faster 40-yard dash times likely mean better 3-Cone Drill scores.

7. **10 YD and Weight**: Correlation of 0.81 suggests that as weight increases, the 10-yard dash time also likely increases.

8. **BJ and VJ**: Correlation of 0.80 shows a strong positive relationship. Better Broad Jump scores likely mean better Vertical Jump scores.

9. **SS and 40 YD**: Correlation of 0.77 indicates a strong positive relationship. Better Short Shuttle scores likely mean faster 40-yard dash times.

10. **3C and Weight**: Correlation of 0.77 suggests that as weight increases, the 3-Cone Drill score likely increases.

11. **VJ and 10 YD**: Correlation of -0.77 indicates a strong negative relationship. Better Vertical Jump scores likely mean faster 10-yard dash times.

12. **SS and Weight**: Correlation of 0.76 suggests that as weight increases, the Short Shuttle score likely increases.

13. **10 YD and 3C**: Correlation of 0.74 shows a strong positive relationship. Faster 10-yard dash times likely mean better 3-Cone Drill scores.

14. **VJ and 40 YD**: Correlation of -0.73 indicates a strong negative relationship. Better Vertical Jump scores likely mean faster 40-yard dash times.

15. **ARM and Height**: Correlation of 0.73 shows a strong positive relationship. Taller height likely means longer arm length.

16. **225 BENCH and Weight**: Correlation of 0.71 indicates that as weight increases, the 225 Bench Press score likely increases.

17. **10 YD and SS**: Correlation of 0.70 shows a strong positive relationship. Faster 10-yard dash times likely mean better Short Shuttle scores.

18. **3C and VJ**: Correlation of -0.69 indicates a strong negative relationship. Better 3-Cone Drill scores likely mean better Vertical Jump scores.

19. **3C and BJ**: Correlation of -0.68 shows a strong negative relationship. Better 3-Cone Drill scores likely mean better Broad Jump scores.

20. **SS and BJ**: Correlation of -0.68 indicates a strong negative relationship. Better Short Shuttle scores likely mean better Broad Jump scores.