Thoughts on trying to predict the NFL Draft

Although analyzing the relationships between variables of NFL draft prospects was actually very interesting and I could've continued digging deeper forever, ultimately I needed to actually move on and try to create a prediction formula for the draft selections.  

My first attempts were to try to predict the aforementioned draft value of each prospect given their physical measurements, top 100 ranking composite score, aggregated mock draft scores and how many visits they made.  It actually did pretty well in terms of accuracy I mentioned in my earlier post comparing the NFL pundit predictions.  I do want to reiterate that I graded them on pick selection accuracy, not team based accuracy, so it is more of a reflection of what pick number the prospect should be than where they should go.  It's just a quick way to make a general suggestion of what should happen.  

Before I looked pundits' predicted vs. actual JJ Draft Value plots and ranked them by the amount of the actual draft value an algorithm could predict given their previous mock drafts results.  Here is the prediction formula's plot, it performed well at a R-square value of XXX.  This would place it above the highest ranking expert I looked at, Todd McShay had a R-square value of 0.76.  Not bad for someone who has no experience scouting and no inside NFL knowledge.


Anyways I graded the prediction formula against the pundits a little differently as well.  This time instead of using JJ Draft Value as a metric, I'll use pick selection for the 2013 Draft.  So I compared the prediction formula's projected pick to the expert's picks, here are the experts picks:


You can see that Mel Kiper did much better in 2013 than he has done in his career, but Todd McShay is still doing better.  By the way I don't mean to pick on Mel or anything, he's just the most recognizable NFL Draft expert.  Anyways, below are my picks versus the actual selections in the 2013 Draft:


What I think this shows is that if you use even a very basic prediction algorithm of what pick a prospect is going to be selected at, rather than trying to predict what team will select them, you can get a better sense of a prospect's draft value than even some of the experts in the field.  

I would love to see people take ideas from this project and expand on the research, I'm sure it would be valuable to any NFL front office.  I will post my data after the project is complete.  In the next post, I will attempt to predict the 2014 draft order.



Exploratory Draft Data: Top 100 rankings

I deviated from the last topic I was going to dive into at the outset of the project.  Instead of looking at the college statistics of each individual I thought it would be more prudent to study the overall rankings of each player.  College stats wouldn't be the best predictors because they're heavily dependent on the scheme and caliber of teammates each prospect had, whereas overall rankings project them compared to other prospects at their respective positions.  I think there will be some crossover effects with the mock draft rankings, but mock drafts project fit with teams and aren't true ordered rankings.


To create a composite style ranking similar to this one, I collected the top 100 rankings of 10 different draft pundits, averaged out their rankings to get a consensus and then ranked them.  Below are the top 50 results from 2013, since my test draft prediction will be run on that draft.


Pundits used:


DraftTek
Mike Mayock
NE Draft
Walter Football
Blogging the Boys
Gil Brandt
New Era Scouting
NFL Draft Scout
Matt Miller
Daniel Jeremiah



Player College Rank Avg
Luke Joeckel Texas A&M 1 1.8
Eric Fisher Central Michigan 2 3.1
Dion Jordan Oregon 3 4.6
Sharrif Floyd Florida 4 6.9
Chance Warmack Alabama 5 7.2
Star Lotulelei Utah 6 8.1
Lane Johnson Oklahoma 7 8.8
Dee Milliner Alabama 8 9.2
Ezekiel Ansah Brigham Young 9 10.6
Jonathan Cooper North Carolina 10 11.1
Sheldon Richardson Missouri 11 11.2
Tavon Austin West Virginia 12 13.22
Barkevious Mingo LSU 13 13.8
Kenny Vaccaro Texas 14 15
Bjoern Werner Florida State 15 17.9
Geno Smith West Virginia 16 19.4
Jarvis Jones Georgia 17 20.8
Tyler Eifert Notre Dame 18 21.2
Xavier Rhodes Florida State 19 21.4
Cordarrelle Patterson Tennessee 20 21.4
Sylvester Williams North Carolina 21 26.7
Alec Ogletree Georgia 22 27.4
D.J. Fluker Alabama 23 27.5
Cornellius Carradine Florida State 24 28.7
Desmond Trufant Washington 25 29.2
Jonathan Cyprien Florida Int'l 26 30.9
Keenan Allen California 27 31.7
Datone Jones UCLA 28 32.2
Manti Te'o Notre Dame 29 33.3
DeAndre Hopkins Clemson 30 33.5
Arthur Brown Kansas State 31 34.1
Damontre Moore Texas A&M 32 36.9
Eddie Lacy Alabama 33 37.11
Matt Elam Florida 34 38.11
Johnthan Banks Mississippi State 35 38.25
Kevin Minter LSU 36 39.3
Kawann Short Purdue 37 40.4
D.J. Hayden Houston 38 40.9
Jamar Taylor Boise State 39 42.1
Robert Woods USC 40 42.3
Matt Barkley USC 41 42.88
Jesse Williams Alabama 42 43.2
Eric Reid LSU 43 45.1
Zach Ertz Stanford 44 45.5
Menelik Watson Florida State 45 45.71
Justin Hunter Tennessee 46 45.8
Alex Okafor Texas 47 46.71
Johnathan Hankins Ohio State 48 49.33
Larry Warford Kentucky 49 51.22
Kyle Long Oregon 50 51.44

Exploratory Draft Data: Evaluating team visits

Another aspect I wanted to look at was if whether or not a prospect visited a team during Draft season influenced if that team eventually selected said player.  To do this, I found Walter Football's team visit list very helpful and tabulated the results for 2013.  I hope to do the same for previous years as well to improve on the accuracy.


Here's a breakdown of what teams brought in what positions (and was reported and collected by Walter Football) for visits or were confirmed to have spoken with them at gatherings.  Again this is just what was reported so that's likely the reason for the disparities in numbers, most teams probably bring in about the same number each year.

The columns were conditionally formatted, with the dark green values indicating that team worked out that position more than other teams worked out the position in the same column, same with the grand totals at the end:

There's some pretty clear indicators in there.  Some I can think of:  New England and Philadelphia like to do their homework, Atlanta scouted a bunch of DB (and picked one 1st round), Buffalo scouted a bunch of QB (and surprise picked one 1st round) and DB and DL were met with by almost every team.


Here's whether or not each team's 1st round pick visited or not (for the record, 61% or 19 of the 31 known prospects did):

One guess as to why more teams didn't bring as many prospects in that they ended up selecting towards the end of the round could be that those team's picked more from the "best player available" methodology and ended up with people they didn't initially believe they would have the opportunity to draft.


Exploratory Draft Data: Comparing pundits mock drafts

Like I have mentioned previously, the NFL Draft is now a major industry that exists within the massive industry that of professional football.  It is a field in which celebrities exist but is accessible to all, people with a lifetime of experience can give their views next to someone who knows little about the subject, and you can even change your ideas as much as you want as the draft season of February to now May mvoes along.  This has both its advantages and disadvantages in predicting where a prospect will be drafted.


What you want to do ultimately do to use the advantages and reduce the disadvantages is aggregate mock draft predictions to reduce the reliance on any one person's judgment.  This new mock ranking should be a combination of both what pick the prospect is predicted to be across the board but also what teams the pundits think are good fits, but for right now is just the draft value points.  This should help quantify the very qualitative process of what a lot of people feel makes sense in terms of fit, which is important as well.


If someone were to expand on this start, I'd suggest they get a much broader array of pundits, I only had time to collect a couple in time to complete this project.  There are literally thousands of people willing to give their opinions of where they think prospects should be drafted.


One thing I do want to note is that in order to numerically compare what pick these prospects should be drafted I used the method that is generally accepted as the standard draft pick value, the Jimmy Johnson Draft Value Chart.  It has been used since the 1990s to come up with a way to numerically compare draft pick trades between teams, so it more accurately describes draft value than just a number slot in my opinion.  I decided to use what the NFL ultimately uses, since I want the prediction to be as accurate as possible.  Incorporating a truer draft value chart would make sense if one were available, so until then the old coach of my Miami Hurricanes will continue to be the way the game is defined.



Comparing mock drafts based on draft pick value:


I collected the final mock predictions of the following draft pundits for any years I could between 2008 and 2013 within my limited timeframe

Some had 2 years worth of data, some had 5 years.  Obviously the more years worth of data, the more accurate the evaluation of the pundit would be but this is what I could collect.  If expanded, I would collect as many years back that I could from many sources.

To compare their accuracy, I fit a linear regression line and am comparing their R-square values.  What this essentially tells me is what percentage of the prospects 1st round draft value you could get right with just each pundit's prediction.  So if Eric Fisher is worth 3000 points as the first pick and Matt Elam is 590 points as the last pick, how close could I come just using the pundit's corresponding draft value prediction as the only variable considered.


Here is one full example, that of the popular Draft godfather himself, Mel Kiper:

This is both a good visual as well as numerical representation of the pundit's accuracy.  The circled blue value Mel guessed around 2200, but the prospect was really "worth" only about 1500.  So Mel overestimated this prospects draft position this particular year. The red line is the linear regression fit line and would go from the bottom left corner to the top right diagonally in an ideal world.  The further this line is off visually indicates how off the accuracy is.  Also the correlation value on the bottom lets me know numerically how closely associated an increase in Mel's predicted draft value is with the prospect's actual draft value.


Here is the full list of pundits, ordered from most accurate to least, along with how many observations I collected of each:

These aren't perfect comparisons because I couldn't find every pundit's mock for each year of every other one (although if I wanted to just compare 2013 I probably could to accurately rank them as of last year).  But it is more for overall accuracy generalizations.  Really what this says is that Todd McShay is better at predicting a prospect's eventual draft value based on JJ's chart better than his contemporary at ESPN, the original draft don, Mel Kiper.




Exploratory Draft Data: Prospect Physical Measurements - Part II

In the last post I broke down WR prospect measurements going back to the 2008 draft.  I focused on WR because it was easy for me to explain a specific position and come away with insights into what to look for when drafting that position.


Below I just want to point out other instances that stood out to me over the different positions with speculative insights I've gathered.



Positions where a faster 40 yard sprint time was more associated with a better "Career Value per Year" (again as determined by PFF):

With this insight you should think of in similar terms to baseball statistician's informative metric WAR (wins above replacement) in that with positions of higher correlations (value closer to -1), prospects who are faster than the average prospect will do 'more better' (yes I just used that, I thought it was an apt descriptor) than a faster prospects of other lower correlated (value closer to 0) positions. So a faster fullback (FB) prospect in the 40 yard sprint will typically provide more value to your team compared to his peers than a faster quarterback (QB) prospect versus his peers.  This of course only considers the 40 yard sprint time measurement as an indicator; it isn't saying you should draft a fast FB over a fast QB when everything else is also considered.

One observation I'd like to make is that I found it interesting that a faster prospect matters more for the positions that line up closer to the "inside" of a formation.  What this is talking about isn't who is closer to the ball in normal football terms of yards away (as in a DT is closer than a LB who is closer than a FS), but it is referring to who is closer to the middle of formation if it were divided in a vertical manner (as in a DT is closer to the middle than a DE, a LB is closer than a CB, etc.).  I'm going to guess that this is because speed is more important to a position in the middle of the formation than at the outside because many times the ball starts in the middle and a play is run to the outside, so the faster a DT or middle LB is getting to the outside on a quick throw to the WR, the better.  When the RB gets a carry and there is no hole to run through between his linemen, a faster RB that can break it to the outside is better than a slower one whereas a faster WR is already on the outside so his speed is less important to his position.  

Where the data doesn't fit this theory is the safety position.  Not only does SS have the lowest correlation among positions where a faster 40 yard time indicates a better player (meaning a faster player isn't that much better than an average one), but the FS position actually shows that a slower prospect is better than an average prospect.  This could be due to the small sample size or one outlier that is very good and also slow, but success at the safety position seems to be the least dependent on speed of the position groups analyzed.


Positions where it is better to be 'quick' than 'fast':

Similar to how in my comparison in the last post I mentioned it was better for WR to be 'quick' than 'fast' (meaning there's a stronger association with 10 yard sprint times and better NFL career value per year than with 40 yard sprint times), I also did the breakdown per position.  In the right most column you'll find the better attribute which was derived from taking the difference in the correlations.  From this, it is better to be 'quick' than 'fast' for WR, DE and FS.


Positions where it is better to simply be taller or heavier:

Here the correlations per position are ordered by where it helps the most to be taller than average.  Again the safety position is perplexing since it is better to be taller for a strong safety but better to be shorter for a free safety, but I think it's once again because of the small sample size.

Here the correlations per position are ordered by where it helps the most to be heavier than average.

From these two considerations alone, it is better to draft DE and SS that are bigger, as both height and weight are positive indicators of better NFL performance.


Positions where it is better to be able to jump higher:

Again safety continues to be such a weird position, I should have the best idea out of any of them since it's the position I played professionally... well if your profession is high school student.  Oh well, basically it is better to be able to jump higher in the NFL, except if your job is to run the ball, in which case you want to stay as low to the ground as possible.


I could honestly continue and do an entire project on observations based solely of physical measurements.  I think it's important to know what characteristics are good indicators of success at each position because the NFL is a copycat league and they want to draft players that fit these stereotypes.  So a prospect that is stereotyped to be able to physically perform well in the NFL will typically be drafted higher than a player that is not often associated with success based on his measurements.  I encourage others to take this premise (comparing PFF-style grades per year to physical attributes) and expand on it.  Get undrafted player info, prospects prior to 2008 or make new, more descriptive metrics and improve on this analysis, I'm sure it will be useful to people that make decisions based on this information.  You can never go wrong with more, relevant data.  I'll help by posting my data after I turn in this current draft prediction project.