Let’s imagine that we had a hunch that lighting incense at midnight contributes to weight loss, and we wanted to test that hunch. How would we do that? We would recruit lots of overweight adults and (with their permission) randomly assign them into two groups. The first group would receive a wakeup call every night at midnight and would then light some incense. The second group would still receive a wakeup call (so that the sleep deprivation itself is not a difference between the groups) and would do something else, like deep breathing exercises. The people in both groups would have their weights measured periodically and any difference in the weights between the two groups would be calculated.
Let’s also imagine that this beautifully designed experiment fails to show any benefit in weight loss from lighting incense at midnight. The two groups’ weights didn’t change, or changed by the same amount, and despite our hunch we are forced to conclude that lighting incense at midnight has no effect on weight loss.
But we have an abiding subjective sense that incense at midnight is extremely healthy, and we’re sure it has a benefit that we haven’t found yet. (An abiding subjective sense is also called a bias.) So a few years later we decide that maybe lighting incense at midnight prevents tooth decay.
We think about doing another experiment just like the one above but in which the two groups are followed to check for difference in dental cavity rates. But then we realize that we can save all the effort and expense by simply getting tooth decay data from the above experiment which was already done. We can look at the original study and get the dental records of all the participants in both groups, find all the cavities, and count whether the incense-lighters had fewer cavities than the non-incense-lighters. That should answer our question, right?
The reason we can’t get reliable data from the prior experiment about cavity risk or cancer risk or anything else other than weight loss is that the two groups are bound to be different in lots of ways simply due to chance. One group likely has more redheads than another or is shorter or has people who are on average richer or live closer to large bodies of water. That’s simply because everyone is different and no two large groups of people (even randomized) will be identical in all characteristics. So it’s very likely that if we went back to our does-incense-help-weight-loss study and looked for differences other than weight loss we would find some differences simply by chance.
To prevent being fooled by random differences, scientists make a big distinction between studies that look at primary endpoints and studies that look at post-hoc endpoints. A primary endpoint is the effect that a study was designed to measure. Before any study is done, the scientists have to clearly define and publicize their primary endpoint. In the example above the primary endpoint is weight loss. A study that shows a difference in a primary endpoint is reliable because the scientists showed the effect that they said they were looking for. The likelihood of doing that by chance is very low.
A post-hoc endpoint is one that is chosen after the trial has been finished to look back at the same data and see if some other characteristic is different. So in the above example, after the study was completed if we looked at the original experiment for differences between the two groups in tooth decay or cancer these would be post-hoc endpoints. These studies are notoriously unreliable because the likelihood of finding a difference between groups that has nothing to do with the experimental intervention is very high. If you look long enough, you will certainly find a difference between the two groups that was not caused by lighting incense but was just due to random differences between the individuals picked for each group.
This is exactly the problem plaguing the studies released this week in The Lancet attempting to link aspirin to cancer prevention. They received much publicity, but will not affect medical practice. They are mostly re-analyses of studies done initially to discover whether aspirin prevents strokes or heart attacks. It does. But using the same data set to ask whether aspirin prevents cancer leaves us vulnerable to the spurious results that post-hoc endpoints allow.
So most doctors, appropriately, will still not recommend aspirin for cancer prevention. We need a large prospective randomized trial to settle the question. Aspirin is inexpensive, so such a trial is unlikely to be sponsored by pharmaceutical companies, but I would think that this would make it a perfect candidate for a government sponsored study.
Studies Link Daily Doses of Aspirin to Reduced Risk of Cancer (NY Times)
Studies Find New Evidence Aspirin May Prevent Cancer (Wall Street Journal)
Should you take aspirin to prevent or treat cancer? (LA Times Booster Shots)
Short-term effects of daily aspirin on cancer incidence, mortality, and non-vascular death: analysis of the time course of risks and benefits in 51 randomised controlled trials (Lancet article, abstract available without subscription)
Effect of daily aspirin on risk of cancer metastasis: a study of incident cancers during randomised controlled trials (Lancet article, abstract available without subscription)
Effects of regular aspirin on long-term cancer incidence and metastasis: a systematic comparison of evidence from observational studies versus randomised trials (Lancet Oncology, abstract available without subscription)
Effect of Aspirin on Vascular and Nonvascular Outcomes (Archives of Internal Medicine article, January)