Using Census Data Analysis to Find Targeted Markets

That's a SEO-approved headline if I've ever seen one.


Thought it'd be good to share a research project I did about 8 months ago for my Data Analysis class.  Maybe someone else can duplicate it in another industry and benefit from it.  The basic instructions for the project were to use exploratory data analysis to gain a business insight; it was an introductory class for business professionals so it didn't involve very heavy lifting but it was challenging nonetheless.  Well since I concurrently was working (and still am) for SolarCity, I thought it'd be good to incorporate it because there might be benefits that emerge throughout the process.  Of course, I won't disclose any business information.


My goal was to pick a regional warehouse and be able to find the most "attractive" areas in which we should focus our sales efforts on.  First I picked the smallest area that the US Census gave open data for (which was the zip code) and found the top 50 zip codes that we have current customers in.  My theory was that our future customers would more likely be similar to our current customers than not, so I decided to focus my data analysis on describing our current customer base.  The US Census will give you a shitload of data, as long as you painstakingly gather it in little chunks.  So that's what I spent most of my time on.


Below is an example of the finished product of collecting the data individually and organizing all of it.  I'd say this took about 2/3 of the total project time:


First I just played around with a portion of the data, to make sure it made sense before proceeding.  To do so, I did a X-Y scatterplot of each Census data column against the total count of customers in each category; thus a positive correlation would mean you would expect more customers in a zip code the higher it is in that certain Census variable.  I only used the relative percent of some of the Census data columns, since this would paint a better picture than total numbers for certain ones.  Below is an example of linear regressions of some, with neutral, positive and negative correlations:


This yielded some interesting patterns.  I wanted to further analyze it and divided each variable (column of a particular statistic from the Census) into similar groupings.  Yes I added some personal bias to this analysis, but rarely was it hard for me to pick a particular grouping for each variable, so I'm confident I didn't inject too much bias.  Below are the groupings of the categories of different Census data:


Then I found the correlations by grouping, this was particularly interesting to see what had the highest positive and negative indicators of more customers.  Below is the graph by grouping:



Within each grouping, I was able to create bar charts that showed the correlations as you went across the spectrum of each.  So if you graphed house value, and there was a clear differentiation line where it flipped from positive to negative correlations, you'd know up to what house value to advertise to because anything more and you're wasting your efforts.  Below is an example of one of those charts, and it's not house value:


The final step was then ranking the zip codes I studied by the number of predicted customers it should have.  This would show the most "attractive" zip codes in which to focus on.  Also, my finding the difference between actual customer count and predicted customer count, you could find zip codes that are "underperforming" and target those first, since they should automatically have more customers than they do.  Below is an example, the right most column conditionally formatted red would be underperforming:



I thought the final product turned out great; I got a good grade on the project but more importantly I hoped it would help increase SolarCity's sales efforts by targeting specific areas that are more "attractive" than others.  It would be a quantitative compass, not a perfect solution but better than before.  After vetting the project with my professor and trusted confidants to make sure it made sense, I approached our sales managers and Direct Sales team (they're the ones that go around neighborhoods on foot so I thought it'd benefit them the most to have data to back up their choices).  However, after initial enthusiasm, it fell off the map like other ideas.  It would've required a bigger and more accurate test and pilot project to prove its worth before rolling out, but I didn't see how it could hurt our efforts.  In my mind it was worth a try.  But as it is often, there are different things on the immediate agenda, either that or I didn't talk to the right person.


Hopefully it's revisited in the future and I can expand on it more, this time while getting paid to do the analysis.  That'd be cool.  But I encourage anyone else who thinks they can benefit from similar analysis to do the same and see if it helps you out.