Tuesday, March 25, 2008

LFotW: Confusing Correlation with Causation

This fallacy is similar to the post-hoc fallacy in that it assumes cause and effect for two variables simply because they occur together. This fallacy is often used to give a statistical correlation a causal interpretation. For example, during the 1990’s both religious attendance and illegal drug use were on the rise. It would be a fallacy to conclude that therefore, religious attendance causes illegal drug use. It is also possible that drug use leads to an increase in religious attendance, or that both drug use and religious attendance are increased by a third variable, such as an increase in societal unrest, or even just population. It is also possible that both variables are independent of one another, and it is mere coincidence that they are both increasing at the same time.

This fallacy, however, has a tendency to be abused, or applied inappropriately, to deny all statistical evidence. In fact this constitutes a logical fallacy in itself, the denial of causation. This abuse takes two basic forms. The first is to deny the significance of correlations that are demonstrated with prospective controlled data, such as would be acquired during a clinical experiment. The problem with assuming cause and effect from mere correlation is not that a causal relationship is impossible; it’s just that there are other variables that must be considered and not ruled out a-priori. A controlled trial, however, by its design attempts to control for as many variables as possible in order to maximize the probability that a positive correlation is in fact due to a causation.

Further, even with purely epidemiological, or statistical, evidence it is still possible to build a strong scientific case for a specific cause. The way to do this is to look at multiple independent correlations to see if they all point to the same causal relationship. For example, it was observed that cigarette smoking correlates with getting lung cancer. The tobacco industry, invoking the “correlation is not causation” logical fallacy, argued that this did not prove causation. They offered as an alternate explanation “factor x”, a third variable that causes both smoking and lung cancer. But we can make predictions based upon the smoking causes cancer hypothesis. If this is the correct causal relationship, then duration of smoking should correlate with cancer risk, quitting smoking should decrease cancer risk, smoking unfiltered cigarettes should have a higher cancer risk than filtered cigarettes, etc. If all of these correlations turn out to be true, which they are, then the smoking causes cancer hypothesis is supported above other possible causal relationship and it is not a logical fallacy to conclude from this evidence that smoking probably causes lung cancer.

No comments: