I'm a big fan of the blog Priceonomics and I posted four years ago about one of their data visualizations where they look at how diverse major U.S. cities are using a metric that was new to me at the time, the Herfindahl–Hirschman Index or HHI. HHI looks at how evenly groups are represented among a market, and it's a really simple formula. You just sum the square of each group's market share as a percent, so an evenly distributed market would have a value of 1/(number of groups) and a complete monopoly by one group would have a value of 1.
In the case a market has four groups, a perfectly diverse market would be:
(0.25)^2+(0.25)^2+(0.25)^2+(0.25)^2 = 0.25
and a complete monopoly by any one group would be:
(1.00)^2+(0)^2+(0)^2+(0)^2 = 1
Since I've been playing around with College Scorecard and IPEDS (Integrated Postsecondary Education Data System) college data recently, I thought it'd be cool to apply the same HHI method to each state's flagship university. The thinking is each flagship university is likely one of the most diverse colleges in that state, and is probably a good reflection of that state's diversity as well. So if I could find a HHI score for each state and each flagship university, I should not only be able to compare flagships to one another to see which are the most/least diverse, but also should be able to compare which states' flagship universities are more/less diverse than that of the state's population. Lastly I should be able to run a simple linear regression to see how good a prediction of a flagship's diversity just its state's diversity is.
**Disclaimer: I normalized the market share of each racial group across these five groups: White, Black, Hispanic, Asian, and Native American/Pacific Islander. This means I am not considering any student that identified as: two or more races, non-resident alien, or unknown. I realize this is affecting the accuracy of the data -- especially not including two or more race responses -- but it was the cleanest way to compare the most prominent groups. Another note is that this research has already been done by others, including this fantastic interactive piece by Ben Myers in 2016 and this Hechinger report from earlier in 2018**
Let's start with the rankings of each state's flagship university, from most diverse to least diverse (perfectly diverse among 5 groups = 0.2, complete monopoly by any one group = 1):
Generally, the greater the population of White students the less diverse that university is; this isn't a surprise. What I did find surprising is just how White some schools (and their respective states) are. Seeing it as a percent is one thing, visualizing that on a bar graph really puts feeling behind that number. This is why I love good charts or data visualizations: they pack a feeling of scale or proportion that numbers can just never achieve. [Again keep in mind that I normalized the actual percentages across the five groups, so each group's actual percentage is smaller than it appears -- however their proportions to each other remained the same]
Another cool thing with the above graph is it visually identifies "sister schools", or schools where their populations are roughly the same proportions (via the scientific "squint and see which colors are about equal" method). Theoretically, you could Truman Show-someone: switch every other student at their school with the other school's population (save for a few close friends or familiar faces) and that student would never know the difference. However, that student would notice the difference if the student population of Rutgers (NJ) was replaced with that of New Hampshire. Completely useless, but a fun thought experiment.
Looking at this data leads to a whole other discussion about how we stereotype certain schools based on their populations; for instance, I'm not at all surprised that UNC and UVA have similar populations since they're bordering states, but I am surprised that UCONN has a very similar population mix.
Another case: it makes sense that ND and SD's flagships have nearly identical populations or that VT/NH/ME all are very similar, but ND/SD also are extremely similar to VT/NH/ME.
Last one of these, but even though they are on opposite coasts of the U.S., the University of Florida and the University of Arizona have very similar populations.
Most of the time these schools have very close HHI scores (meaning they are similarly diverse, sorry for pointing out the obvious) but interestingly there are cases when another school will slip in between with very different proportions. This is interesting since technically that school that in between (B) is closer in diversity to the top school (A) than the bottom school (C) yet the top and bottom (A & C) schools have much more similar student populations. For instance, the universities of Delaware and Kansas have very similar populations, but technically Minnesota and Mississippi are more similar in diversity to Delaware than Kansas even though Kansas' population is more similar.
I think there could be an economic principle buried in this closer-but-not-as-close oddity and the Truman Show-style thought experiment above but that's for another time.
Anyways... now let's look at how each flagship university's diversity compares with their corresponding state's diversity. The schools at the top of the diagram below are MORE diverse than their respective states, the ones on bottom are LESS diverse than their state. [Note: State population figures have been normalized across the same five racial groups, again disregarding responses of two or more races or unknown, for comparison purposes]
As you can see, most flagships are not as diverse as their respective state's population. This is somewhat surprising to me, as I would've guess it'd be more of a 50-50 spit for some reason. In general, schools in the South (SEC football conference) need to do a MUCH better job at making their student population more reflective of their state's diversity. And WTF Delaware??
Finally, let's test my theory that a state's diversity is a pretty good predictor of that state's flagship university's diversity. To do so, I'm going to see how correlated the state and flagship's HHI scores are. Plotting them on a scatter plot (with college.hhi depending on state.hhi) and finding the linear trendline and corresponding R-squared should do the trick.
As expected, they're highly correlated, with an R-squared of nearly 0.77. You can essentially read this number as "A state's diversity score explains nearly 77% of that state's flagship university's diversity score". Even if it explained 100%, it doesn't mean that each group's proportion to one another is the same at the flagship as the state, but the overall diversity of the population market is equal. Again, not entirely useful, but interesting nonetheless.
That's it, a quick look at how diverse each state's flagship university is. I'm attaching a publicly available version of the data I collected on this Google Sheet. I got the school's demographic data from College Scorecard (download link fyi), the state's data from U.S. Census estimates in 2017, and a list of flagship university per state by just Googling it.