Correlation vs. Causation - A Lesson in Careful Interpretation

Part 1 of The "Marketing Analyst's Guide to Business Statistics" Series

Several years ago, a manager of marketing and communications of a large gym franchise reached out to us to help interpret a pretty strange phenomenon. Since we love strange things, of course we took her up on the request to advise. For anonymity purposes, we will call her Jill.

The problem Jill was charged to solve was pretty simple – drive foot traffic back into their gyms. What they learned over time was that driving gym traffic actually had some pretty impressive impact on their new member acquisition. And if you think about it, it makes sense. When we have a goal, like, say, a fitness goal, we like to buy-in with a partner. The accountability, camaraderie, and comfort of not working out alone are enough to explain this away.

Jill sent us a PowerPoint deck she was due to present to senior leadership as well as a raw data file. Opening the PowerPoint deck first, and scanning through quickly, we stopped dead in our tracks when we read the following statement:

Our marketing communications to our members is causing foot traffic in our gyms to go down.

Wait, what? How would marketing communications actually inspire gym members to stop showing up?

As researchers, we get pretty comfortable in uncertain environments. We understand that there are multiple possible explanations to some outcome. At times there can be one explanation, but often enough we do find that there are in fact multiple explanations to some outcome. So we went through the checklist. First, let’s see the communications strategy, notably the articulations architecture (you know, to make sure they did not mistakenly align themselves with a terrorist organization). It turned out that this was far from the case.

What about seasonality? Data tends to show that gym memberships spike at certain times of the year, and foot traffic tends to favor a pattern of the New Year’s Resolution. January foot traffic is heavy; it starts to fade in February. When March hits, well, it’s pretty much the gym rats left. The marketing communications rolled out in mid-April and the plan was to run through July. By the beginning of July, it certainly looked like it wasn’t exactly successful. But there’s got to be some alternative explanation.

OK, let’s look at data. Running some exploratory output, it did in fact look like foot traffic was trending down during the two months of this communications campaign. What we noticed, however, was that this there was subtle growth in a majority of gyms, and looking at response by campaign, it appeared that traffic was highest in the week following each individual campaign rollout. So, it appeared it was working in some areas. However, we found some pretty steep drop-off in 20% of their facilities.

So what’s going on in those specific 20%? Well, for one, it looks as if the bottom fell out in those markets. Foot traffic dropped substantially, almost overnight! So we simply filtered the results by market, and found a pretty common theme. Take a look at a few sample markets:

  • Columbus, Ohio
  • Ann Arbor, MI
  • Tuscaloosa, Alabama
  • Eugene, Oregon
  • Athens, Georgia
  • Lincoln, Nebraska
  • Tallahassee, Florida
  • Charlottesville, Virginia
  • College Station, Texas

You have read the first several markets and not picked up a theme until the last market – College Station, Texas. College Station is the home of Texas A&M University. College Station, Texas is a city of 100,000 people. And Texas A&M’s student population is nearly 60,000. And what happens in May and June in College Station, Texas? The town turns to ghost town, as a substantial proportion of college students return home for the summer.

So what was the takeaway from this? Offer a mobilized membership to college students.

Correlation Does Not Imply Causation

One of the hot-button topics - something statisticians love to correct those interpreting statistical output - is the difference between correlation and causation. In marketing research and business intelligence, we tend to understand what we collectively mean when looking items in correlation. Ultimately, we just want to understand relationships. What variables of interest (call center responsiveness, customer service representative knowledge, price of dairy products, etc.) influence some desired outcome? (customer satisfaction, loyalty, perceived brand value, etc.).

Where we often get tripped up, however, is in the difference between correlation and causation. Correlation suggests two variables move in similar directions to each other. So, a positive correlation suggests that when one variable increases, another variable likely increases, and vice versa. A negative correlation suggests that as one variable increases, we see a pattern of another variable decreasing, and vice versa. The bottom line here – correlation suggests that two items share some linear relationship. This linear relationship, however, does not suggest that one variable is directly responsible for the movement of another variable. Causation, on the other hand, means that the increase or decrease of one variable is directly responsible for the increase or decrease of another variable.

To illustrate, let’s throw out an example. There is a linear correlation between tornado damage in the United States and NBA TV ratings.

While this seems ridiculous, it perfectly illustrates the difference between correlation and causation. How can cool air from the Rocky and Appalachian Mountains converging with warm air traveling upward from the Gulf of Mexico, subsequently forming supercell thunderstorms, share a relationship with the percentage of US citizens watching an NBA basketball game? The answer is simple: tornado season, known to occur between mid-March and into early June, happens to share a similar date pattern to the NBA playoffs, where the NBA enjoys its greatest ratings.

The NBA did not specifically select tornado season to run its most important games. Tornadoes do not touch down as an atmospheric response to the impact of Lebron James’s buzzer beating 3-pointer. The reality is that their peak activity happens to occur at the same time. The same can be said of much activity in marketing research and business intelligence.

Lebron James in Tornado showing an example of a correlation that implies no causation

So why is this important? Well, assuming we are using research or business intelligence to inform business decisions, wouldn’t we prefer to understand that the item of interest actually influences our desired outcome?

 

If you enjoyed this post, drop us a line in the comments section, and subscribe to our newsletter to be informed when new posts in this series (and others) are released.

 

In the meantime, check out our article "Don't Live in the Ivory Tower - 5 Important Questions to Ask When Conducting Marketing Research".

 

And just for fun, here is an image representing a famous example of correlation not implying causation from Nate Silver, author of "The Signal and the Noise" and founder of fivethirtyeight.com

Ice cream and forest fire example of correlation not causation from Nate Silver's "The Signal and the Noise"