Spurious Correlations

It never ceases to amaze me how some researchers/consultants/journalists can find a correlation between two datasets and then jump to a conclusion that one is absolutely related to, or even causing, the other. At best it’s lazy; at worst, it can lead to bad decisions being made.

You see this frequently in political reporting where the approach is prevalent and used to convince us that one political party is bad or another is great.

Just one example, published recently by Dr Meenakshi Parameshwaran – a Research Associate at LKMco shows a correlation between the location density of the 4 million people who voted UKIP in the recent UK election with that of areas in the UK where there is “education underperformance”. The article goes on to say “It is time to wake up to the uncanny relationship between UKIP voting and education.”

It doesn’t take Einstein to deduce that what’s being implied here is that UKIP voters are stupid. This is further evidenced as the research went viral and individuals all over the place re-tweeting the article with comments such as “told you so” and “this explains a lot”.

Well, actually it doesn’t ‘explain’ much at all. Apart from being offensive to anyone who voted UKIP, what’s also concerning is that using this lazy approach actually detracts from a more nuanced story.

If we look wider, we can correlate all manner of things with the density of UKIP voters. Such as high unemployment, lower incomes, higher levels of immigration, greater housing need (versus the UK averages). It’s only when you begin to look much wider – at other datasets that correlate with UKIP voting – do you begin to get a more rounded story. A story that would suggest that there are higher levels of UKIP voters in areas that have been most affected by immigration.

Now, that begins to make more sense, doesn’t it? But perhaps doesn’t play to the lazy stereotype that some have cultivated.

So, is it fair to imply then that UKIP voters are stupid? No. But then lazy journalism and lazy research analysis has never stopped people from presenting a somewhat spurious correlation as evidence of proof or causality in the quest to prove their own point/support their own agenda.

I stumbled across a website with other examples of spurious correlations which makes the point better than I ever could, that relying purely on two related datasets (when there are many more to consider) to draw conclusions is lazy and could even be dangerous. Take a look by clicking here. Some of them are brilliant. I particularly like the one that correlates the number of films Nicolas Cage appears in with pool deaths by drowning. These spurious correlations are contained in a book by the same name by Tyler Vigen.

I’ve seen this lazy approach used by some ‘reputation advisers’ where a simple reputation measure is correlated with things such as product sales or market share. Perhaps the relationship does show a cause/effect. But it’s only when you consider all other possibilities that you get closer to the truth. Such as media sentiment; brand strength; competitor performance; employee engagement; market dynamics and so on.

What can we learn here? Well, as consultants and executives who rely on research to advise others and to help them make important business decisions, it is imperative – and I would say beholden upon on us – to always challenge simple correlations. We should always be seeking the wider picture and asking: but what else is at play here?

Where possible, it’s important to study the relationships between as many datasets as possible, that way it is more likely to get closer to the real story. And, we’re less likely to (wrongly) suggest that Nicolas Cage is responsible for the rise in pool deaths in the USA.

(Disclaimer: I did not vote for UKIP!)