Thoughts on trying to predict the NFL Draft

Although analyzing the relationships between variables of NFL draft prospects was actually very interesting and I could've continued digging deeper forever, ultimately I needed to actually move on and try to create a prediction formula for the draft selections.  

My first attempts were to try to predict the aforementioned draft value of each prospect given their physical measurements, top 100 ranking composite score, aggregated mock draft scores and how many visits they made.  It actually did pretty well in terms of accuracy I mentioned in my earlier post comparing the NFL pundit predictions.  I do want to reiterate that I graded them on pick selection accuracy, not team based accuracy, so it is more of a reflection of what pick number the prospect should be than where they should go.  It's just a quick way to make a general suggestion of what should happen.  

Before I looked pundits' predicted vs. actual JJ Draft Value plots and ranked them by the amount of the actual draft value an algorithm could predict given their previous mock drafts results.  Here is the prediction formula's plot, it performed well at a R-square value of XXX.  This would place it above the highest ranking expert I looked at, Todd McShay had a R-square value of 0.76.  Not bad for someone who has no experience scouting and no inside NFL knowledge.

Anyways I graded the prediction formula against the pundits a little differently as well.  This time instead of using JJ Draft Value as a metric, I'll use pick selection for the 2013 Draft.  So I compared the prediction formula's projected pick to the expert's picks, here are the experts picks:

You can see that Mel Kiper did much better in 2013 than he has done in his career, but Todd McShay is still doing better.  By the way I don't mean to pick on Mel or anything, he's just the most recognizable NFL Draft expert.  Anyways, below are my picks versus the actual selections in the 2013 Draft:

What I think this shows is that if you use even a very basic prediction algorithm of what pick a prospect is going to be selected at, rather than trying to predict what team will select them, you can get a better sense of a prospect's draft value than even some of the experts in the field.  

I would love to see people take ideas from this project and expand on the research, I'm sure it would be valuable to any NFL front office.  I will post my data after the project is complete.  In the next post, I will attempt to predict the 2014 draft order.

Exploratory Draft Data: Top 100 rankings

I deviated from the last topic I was going to dive into at the outset of the project.  Instead of looking at the college statistics of each individual I thought it would be more prudent to study the overall rankings of each player.  College stats wouldn't be the best predictors because they're heavily dependent on the scheme and caliber of teammates each prospect had, whereas overall rankings project them compared to other prospects at their respective positions.  I think there will be some crossover effects with the mock draft rankings, but mock drafts project fit with teams and aren't true ordered rankings.

To create a composite style ranking similar to this one, I collected the top 100 rankings of 10 different draft pundits, averaged out their rankings to get a consensus and then ranked them.  Below are the top 50 results from 2013, since my test draft prediction will be run on that draft.

Pundits used:

Mike Mayock
NE Draft
Walter Football
Blogging the Boys
Gil Brandt
New Era Scouting
NFL Draft Scout
Matt Miller
Daniel Jeremiah

Player College Rank Avg
Luke Joeckel Texas A&M 1 1.8
Eric Fisher Central Michigan 2 3.1
Dion Jordan Oregon 3 4.6
Sharrif Floyd Florida 4 6.9
Chance Warmack Alabama 5 7.2
Star Lotulelei Utah 6 8.1
Lane Johnson Oklahoma 7 8.8
Dee Milliner Alabama 8 9.2
Ezekiel Ansah Brigham Young 9 10.6
Jonathan Cooper North Carolina 10 11.1
Sheldon Richardson Missouri 11 11.2
Tavon Austin West Virginia 12 13.22
Barkevious Mingo LSU 13 13.8
Kenny Vaccaro Texas 14 15
Bjoern Werner Florida State 15 17.9
Geno Smith West Virginia 16 19.4
Jarvis Jones Georgia 17 20.8
Tyler Eifert Notre Dame 18 21.2
Xavier Rhodes Florida State 19 21.4
Cordarrelle Patterson Tennessee 20 21.4
Sylvester Williams North Carolina 21 26.7
Alec Ogletree Georgia 22 27.4
D.J. Fluker Alabama 23 27.5
Cornellius Carradine Florida State 24 28.7
Desmond Trufant Washington 25 29.2
Jonathan Cyprien Florida Int'l 26 30.9
Keenan Allen California 27 31.7
Datone Jones UCLA 28 32.2
Manti Te'o Notre Dame 29 33.3
DeAndre Hopkins Clemson 30 33.5
Arthur Brown Kansas State 31 34.1
Damontre Moore Texas A&M 32 36.9
Eddie Lacy Alabama 33 37.11
Matt Elam Florida 34 38.11
Johnthan Banks Mississippi State 35 38.25
Kevin Minter LSU 36 39.3
Kawann Short Purdue 37 40.4
D.J. Hayden Houston 38 40.9
Jamar Taylor Boise State 39 42.1
Robert Woods USC 40 42.3
Matt Barkley USC 41 42.88
Jesse Williams Alabama 42 43.2
Eric Reid LSU 43 45.1
Zach Ertz Stanford 44 45.5
Menelik Watson Florida State 45 45.71
Justin Hunter Tennessee 46 45.8
Alex Okafor Texas 47 46.71
Johnathan Hankins Ohio State 48 49.33
Larry Warford Kentucky 49 51.22
Kyle Long Oregon 50 51.44

Exploratory Draft Data: Evaluating team visits

Another aspect I wanted to look at was if whether or not a prospect visited a team during Draft season influenced if that team eventually selected said player.  To do this, I found Walter Football's team visit list very helpful and tabulated the results for 2013.  I hope to do the same for previous years as well to improve on the accuracy.

Here's a breakdown of what teams brought in what positions (and was reported and collected by Walter Football) for visits or were confirmed to have spoken with them at gatherings.  Again this is just what was reported so that's likely the reason for the disparities in numbers, most teams probably bring in about the same number each year.

The columns were conditionally formatted, with the dark green values indicating that team worked out that position more than other teams worked out the position in the same column, same with the grand totals at the end:

There's some pretty clear indicators in there.  Some I can think of:  New England and Philadelphia like to do their homework, Atlanta scouted a bunch of DB (and picked one 1st round), Buffalo scouted a bunch of QB (and surprise picked one 1st round) and DB and DL were met with by almost every team.

Here's whether or not each team's 1st round pick visited or not (for the record, 61% or 19 of the 31 known prospects did):

One guess as to why more teams didn't bring as many prospects in that they ended up selecting towards the end of the round could be that those team's picked more from the "best player available" methodology and ended up with people they didn't initially believe they would have the opportunity to draft.

Exploratory Draft Data: Comparing pundits mock drafts

Like I have mentioned previously, the NFL Draft is now a major industry that exists within the massive industry that of professional football.  It is a field in which celebrities exist but is accessible to all, people with a lifetime of experience can give their views next to someone who knows little about the subject, and you can even change your ideas as much as you want as the draft season of February to now May mvoes along.  This has both its advantages and disadvantages in predicting where a prospect will be drafted.

What you want to do ultimately do to use the advantages and reduce the disadvantages is aggregate mock draft predictions to reduce the reliance on any one person's judgment.  This new mock ranking should be a combination of both what pick the prospect is predicted to be across the board but also what teams the pundits think are good fits, but for right now is just the draft value points.  This should help quantify the very qualitative process of what a lot of people feel makes sense in terms of fit, which is important as well.

If someone were to expand on this start, I'd suggest they get a much broader array of pundits, I only had time to collect a couple in time to complete this project.  There are literally thousands of people willing to give their opinions of where they think prospects should be drafted.

One thing I do want to note is that in order to numerically compare what pick these prospects should be drafted I used the method that is generally accepted as the standard draft pick value, the Jimmy Johnson Draft Value Chart.  It has been used since the 1990s to come up with a way to numerically compare draft pick trades between teams, so it more accurately describes draft value than just a number slot in my opinion.  I decided to use what the NFL ultimately uses, since I want the prediction to be as accurate as possible.  Incorporating a truer draft value chart would make sense if one were available, so until then the old coach of my Miami Hurricanes will continue to be the way the game is defined.

Comparing mock drafts based on draft pick value:

I collected the final mock predictions of the following draft pundits for any years I could between 2008 and 2013 within my limited timeframe

Some had 2 years worth of data, some had 5 years.  Obviously the more years worth of data, the more accurate the evaluation of the pundit would be but this is what I could collect.  If expanded, I would collect as many years back that I could from many sources.

To compare their accuracy, I fit a linear regression line and am comparing their R-square values.  What this essentially tells me is what percentage of the prospects 1st round draft value you could get right with just each pundit's prediction.  So if Eric Fisher is worth 3000 points as the first pick and Matt Elam is 590 points as the last pick, how close could I come just using the pundit's corresponding draft value prediction as the only variable considered.

Here is one full example, that of the popular Draft godfather himself, Mel Kiper:

This is both a good visual as well as numerical representation of the pundit's accuracy.  The circled blue value Mel guessed around 2200, but the prospect was really "worth" only about 1500.  So Mel overestimated this prospects draft position this particular year. The red line is the linear regression fit line and would go from the bottom left corner to the top right diagonally in an ideal world.  The further this line is off visually indicates how off the accuracy is.  Also the correlation value on the bottom lets me know numerically how closely associated an increase in Mel's predicted draft value is with the prospect's actual draft value.

Here is the full list of pundits, ordered from most accurate to least, along with how many observations I collected of each:

These aren't perfect comparisons because I couldn't find every pundit's mock for each year of every other one (although if I wanted to just compare 2013 I probably could to accurately rank them as of last year).  But it is more for overall accuracy generalizations.  Really what this says is that Todd McShay is better at predicting a prospect's eventual draft value based on JJ's chart better than his contemporary at ESPN, the original draft don, Mel Kiper.

Exploratory Draft Data: Prospect Physical Measurements - Part II

In the last post I broke down WR prospect measurements going back to the 2008 draft.  I focused on WR because it was easy for me to explain a specific position and come away with insights into what to look for when drafting that position.

Below I just want to point out other instances that stood out to me over the different positions with speculative insights I've gathered.

Positions where a faster 40 yard sprint time was more associated with a better "Career Value per Year" (again as determined by PFF):

With this insight you should think of in similar terms to baseball statistician's informative metric WAR (wins above replacement) in that with positions of higher correlations (value closer to -1), prospects who are faster than the average prospect will do 'more better' (yes I just used that, I thought it was an apt descriptor) than a faster prospects of other lower correlated (value closer to 0) positions. So a faster fullback (FB) prospect in the 40 yard sprint will typically provide more value to your team compared to his peers than a faster quarterback (QB) prospect versus his peers.  This of course only considers the 40 yard sprint time measurement as an indicator; it isn't saying you should draft a fast FB over a fast QB when everything else is also considered.

One observation I'd like to make is that I found it interesting that a faster prospect matters more for the positions that line up closer to the "inside" of a formation.  What this is talking about isn't who is closer to the ball in normal football terms of yards away (as in a DT is closer than a LB who is closer than a FS), but it is referring to who is closer to the middle of formation if it were divided in a vertical manner (as in a DT is closer to the middle than a DE, a LB is closer than a CB, etc.).  I'm going to guess that this is because speed is more important to a position in the middle of the formation than at the outside because many times the ball starts in the middle and a play is run to the outside, so the faster a DT or middle LB is getting to the outside on a quick throw to the WR, the better.  When the RB gets a carry and there is no hole to run through between his linemen, a faster RB that can break it to the outside is better than a slower one whereas a faster WR is already on the outside so his speed is less important to his position.  

Where the data doesn't fit this theory is the safety position.  Not only does SS have the lowest correlation among positions where a faster 40 yard time indicates a better player (meaning a faster player isn't that much better than an average one), but the FS position actually shows that a slower prospect is better than an average prospect.  This could be due to the small sample size or one outlier that is very good and also slow, but success at the safety position seems to be the least dependent on speed of the position groups analyzed.

Positions where it is better to be 'quick' than 'fast':

Similar to how in my comparison in the last post I mentioned it was better for WR to be 'quick' than 'fast' (meaning there's a stronger association with 10 yard sprint times and better NFL career value per year than with 40 yard sprint times), I also did the breakdown per position.  In the right most column you'll find the better attribute which was derived from taking the difference in the correlations.  From this, it is better to be 'quick' than 'fast' for WR, DE and FS.

Positions where it is better to simply be taller or heavier:

Here the correlations per position are ordered by where it helps the most to be taller than average.  Again the safety position is perplexing since it is better to be taller for a strong safety but better to be shorter for a free safety, but I think it's once again because of the small sample size.

Here the correlations per position are ordered by where it helps the most to be heavier than average.

From these two considerations alone, it is better to draft DE and SS that are bigger, as both height and weight are positive indicators of better NFL performance.

Positions where it is better to be able to jump higher:

Again safety continues to be such a weird position, I should have the best idea out of any of them since it's the position I played professionally... well if your profession is high school student.  Oh well, basically it is better to be able to jump higher in the NFL, except if your job is to run the ball, in which case you want to stay as low to the ground as possible.

I could honestly continue and do an entire project on observations based solely of physical measurements.  I think it's important to know what characteristics are good indicators of success at each position because the NFL is a copycat league and they want to draft players that fit these stereotypes.  So a prospect that is stereotyped to be able to physically perform well in the NFL will typically be drafted higher than a player that is not often associated with success based on his measurements.  I encourage others to take this premise (comparing PFF-style grades per year to physical attributes) and expand on it.  Get undrafted player info, prospects prior to 2008 or make new, more descriptive metrics and improve on this analysis, I'm sure it will be useful to people that make decisions based on this information.  You can never go wrong with more, relevant data.  I'll help by posting my data after I turn in this current draft prediction project.  

Exploratory Draft Data: Prospect Physical Measurements - Part I

Perhaps the easiest way to begin the process of predicting when a prospect will be drafted is to look back at the recent history of draft picks and see how the current year's players compare to those in the past from a physical standpoint.  All the draft history data was found at Pro-Football-Reference and all the prospect measurements was found at Mock Draftable.

Mock Draftable has a very ingenious way of comparing prospects to others of the same physical attributes by creating a "star" graph with each category (height, weight, 40 time, etc.) on an axis by percentile within each position.  I've had similar ideas to creating this type of star graph going back to 2009 so I'm glad to see that someone else thinks similarly and has actually created it.  The general thesis is that this type of graph provides a general, well-balanced overview evaluation of many different numbers in a single graph.  A perfect prospect measured on a star graph with 8 different attributes would look like an octagon, since hitting 100% of each axis would create the uniform shape.

So instead of starting by comparing prospect's physical measurements to those of recent past drafts, I'll let Mock Draftable do the hard work there and instead concentrate on something else in the beginning.

What I want to look at in the beginning is what specific measurements in each category (again height, weight, etc.) are better indicators of NFL performance, for instance do 6'3" tall wide receiver fare better typically than 6'2" tall receiver, etc.  I saved myself from doing the hard work on grading how well players have done since entering the NFL by relying on the invaluable website Pro Football Focus (PFF) for that information.  PFF has a team of game analysts that watch every players on every play for every team and who grade the player's performance as unbiased as possible.  So to graph the players worth thus far, I collected each of their career values (how good they have been) and divided it by the number of years they have been in the NFL to get a "Career Value per Year" score.

Let's start with a simple graph, showing the prospect's age graphed against the Car Val per Year (all data analysis was done in SAS' JMP software):

As you can see from the smooth line fit, drafting younger players is typically better than drafting older players.  There could be many theories as to why but I'll try to stay clear of causal theories in this statistical analysis.

Next I broke it down by positions.  This allows you to truly compare apples to apples, since we're comparing one WR to another WR.  This is the full breakdown for the WR position with some insights from the limited data set going back to the 2008 draft:

This shows the means (averages) of each year old when drafted, their height in inches and weight in pounds.  There's a clear indicator that drafting younger WRs is better than drafting older ones but height and weight is less clear.  There's no specific sweet spot for height and weight, but to answer my earlier question, yes typically 6'3" tall WR are better players than 6'2" players in the NFL.  For a much, much deeper dive into WR measurables, the fantasy football site Rotoviz has done some great work thus far.

This graph of arm lengths and hand sizes in inches is more clear.  Generally you want to draft WRs with longer arms and bigger hands.  Presumably, all other things neglected, this makes it easier for the WR to reach out and snare footballs thrown at them.

I found this subset particularly interesting.  It is comparing the times to complete sprints of 10, 20 and 40 yards.  Typically the 40 yard time is the most glamorized and you can see via a simple linear regression that, as one would expect, it's better to draft a faster WR than a slower WR and the regression line increases in "Career Value per Year" as the times get smaller (or the prospects get faster).  But what stood out more to me was the correlations (how likely an increase or decrease in something on the X axis is associated with a corresponding increase or decrease on the Y axis).  As you can see, a faster 20 yard sprint time is more highly correlated with a better NFL career value per year than a 40 yard sprint time is.  As well as a 10 yard sprint time is than a 40 yard sprint time.  What this indicates to me is that it's better to be quick (acceleration measured in 10 yard sprint time) than it is to be fast (speed measured in 40 yard sprint time).

Rounding out the WR analysis, you can see that it is better to have shorter times in the 3 cone and the 20 yard shuttle drills (these usually measure a prospect's change of direction speed) but that this isn't as good of a predictor as the straight line sprint times were.

To summarize, if you were going by solely averages since the 2008 draft, it is better to draft a young WR that has long arms and is more quick than fast, all other things considered.

Doing the impossible

For a project in a Data Mining class that I'm currently enrolled in I have to use data to predict an unknown value or object, given other data that may or may not relate to the subject in question.  Since I'm going to be spending a lot of time on this for the next week, and because I believe the more you're interested in something the better you'll do, I wanted to pick a subject that I was passionate about.  Given the timing, I want to try to do the impossible and predict the 1st round of the NFL Draft as the picks are made live on May 8.

I've always been fascinated by the NFL Draft.  The workings behind the scenes of player evaluation and selection are as interesting to me as the games themselves.  The draft and all that goes into it is something that people devote their lives to; it is even now shown live over 3 consecutive days on one of the most watched channels on TV in the primetime month of May.  Yet with all the attention paid to it by smart people all over the country, even the best pundits can only get 12 of the 32 selections correct.  I thought I'd give it creating a mock draft in real time as the picks are made a shot by simply doing a data analysis project on it.

My plan is to use predictors such as an aggregated mock draft collection of various pundits across the country, player physical attributes, college statistics and confirmed NFL team visits to try and assign a probability to the top guess for each team.

Wish me luck.

Altruistic Punishment and Batman

Batman is one of my favorite superheroes and arguably one of the most popular ones as well.  The story has been told many times from different angles throughout the years but I most enjoy Christopher Nolan's renditions in the most recent trilogy.  Both critical acclaim and box office success also mean that mostly everyone else agrees as well.

The reason most people are drawn to the story of Batman is that he is the most alike to anyone of us; except for the fact that he was born into a family of billionaires.  He doesn't have any superpowers that he got from being bitten by a radioactive spider or because he was born on an alien planet, he just goes out there night after night beating up bad guys with his fists.  Oh and I guess also because he uses his billions of dollars to create fancy weaponry that the U.S. Army doesn't even have but that is besides the point I'm trying to make.  As Nolan notes several times in the most recent films, Batman is an idea whereas Bruce Wayne is just a man.

Batman to me is a symbol of what Nobel laureate Daniel Kahneman refers to in his seminal masterpiece on behavioral psychology, Thinking Fast and Slow, as " the glue that holds societies together".  It happens in the chapter entitled "Bad Events", where Kahneman is describing the concept of altruistic punishment.  He says that experiments looking at MRI results of people that punish others for what they did to someone else actually resulted in increased activities in the "pleasure centers" of the brain.  He notes, "It appears that maintaining the social order and the rules of fairness in this [way] is its own reward."  

It would seem that Batman is fighting crime anonymously all those years because it actually gives him pleasure, that it makes him feel good to right the wronged.  He enjoys punishing criminals that unfairly get to walk because of an unjust legal system.  To him, the personal punishment he takes is worth it because of the social good that could come out of it.

When you start to think of it that way, Nolan's version of the Batman series becomes even more interesting and, therefore, even better than you already thought it was.

// I have to add that when I went to Amazon to get the link for Thinking Fast and Slow, I noticed that the Kindle version was only $2.99.  Three dollars!  Even though I already have it in hardcover, I can't pass up the opportunity to be able to take it wherever I go for only three bucks

Why have I kept my PostHaven account?

Soooo I don't post often at all, I've posted two quick notes with one of them being about what I will post.  But every month I pay $5 to keep my PostHaven account active after contemplating discontinuing it.  I invariably come to the conclusion that I will start posting more but each month goes by and I don't.

So why do I keep paying $5 a month?

The truth is I'm a private person, an introvert who likes to keep most of my thoughts to myself, but I struggle with being more public.  On Facebook I only post funny pictures or things I think my friends will find interesting but not nearly as much as others.  On Twitter I predominately retweet sentiments I agree with or find funny because it's easier and most of the time somebody else can put into words what I'm thinking better than I can.  The only thing I posted semi-regularly on is a music blog I was using with some friends but that was mainly an easy way to share new music with them.

So why do I keep paying $5 a month when I struggle with posting at all?

Because I want to in the future.  I want to share my thoughts with whoever is willing to read them with the hope that they lead to different thoughts on the other end or are just found interesting.  I keep paying each month because I feel it'll force me to post more often than if the site were free.  That might not make sense to most but the small extrinsic motivation keeps me hopeful that I will start to change and open up.  So that's what I'll continue to do, hoping that one day I'll change my ways and become part of the person I wish to be.

And that starts by doing this

Google does it again!

I continue to love and be impressed by what comes out of Google nowadays.  Google Streetview Treks is only the latest new idea.  Being able to look at streetviews of actual hikes and places where the car can't go will only help to open the world's eyes to what is out there.

It's another in the long line of ideas that just makes sense when you first hear about it, like it should have been done a long time ago.  But that's how you know it's a great idea -- because everyone gets it.

They're also offering to lend the equipment to people and organizations that can get to those remote locations