Anecdotes Can Be Strong Evidence and Bayes Theorem Proves It

FCCC

Anecdotes can be weak evidence

We all know that a lot of people overvalue their N=1 self-study: “I started eating only meat and my arthritis was cured!” Unfortunately, the weakness of this data is often attributed to the fact that it is an anecdote, and so, anecdotes are wrongly vilified as “the bottom of the evidential hierarchy.” In the case of a diet change, the remission of arthritis could be due to any number of reasons: time (perhaps it would have healed on its own, diet change or not), perhaps the person had a reduction in stress leading to lower inflammation, perhaps the person acquired a higher pain tolerance, perhaps the person even lied.

Anecdotes can be strong evidence

Anecdotes can actually be incredibly strong evidence, and Bayes Theorem tells us as much. An anecdote is strong evidence for a reasonable hypothesis when there is no alternative reasonable hypothesis. That said, sometimes, the apparent lack of alternative reasonable hypotheses is due to one’s lack of knowledge or imagination. I try to counter this, to some degree, by coming up with a mechanistic hypothesis.

A gross example (I apologise)

For example, I recently heard that armpit odour is caused by bacteria. Bacteria can be killed if exposed to an environment that strongly deviates from the one in which they thrive. So, I scrubbed my armpits with concentrated vinegar, which is acidic (I read that armpit bacteria are basic). I smelled like a salad, so took a shower, and my underarm odour–which I’d had for over a decade–was entirely gone.

Even at this point, I was extremely confident that, even if I got the mechanism wrong, this remedy worked. Why the confidence off of so little evidence? Well, what are the chances that the smell just happened to go away on its own, right at that exact moment? Basically zero. Are there any other reasonable causes for the smell to disappear at the exact moment I used the vinegar. Not that I can think of. Why would anything else have been the cause now, but at no point before? Is the data reliable? I think so, vinegar has never affected my ability to detect odours before.

A few days later, my body odour returned a little, so I did the same thing, and it was gone again.

If my mechanistic hypothesis was right, I should expect the bacteria to evolve towards getting used to these new shocks to their environment. To minimise this chance, I decided to switch back and forth from vinegar to hand sanitiser, reasoning that large swings to multiple extreme environments are going to be harder to evolve defences against. (I’d wager that Rabbits probably wouldn’t have survived myxomatosis if we had simultaneously infected them with ten other lab-made diseases.) The sanitiser also eliminated any odour, which aligns with my mechanistic hypothesis (i.e. bacteria cause the smell and environmental changes kill the bacteria).

At this point, I considered my mechanistic hypothesis to be almost certain, but not quite: There may be alternative explanations that I haven’t considered.

It’s all Bayesian

All studies, of any form, are extremely good at answering very, very specific questions, which are usually not the questions that you’re interested in.

My N=1 vinegar study pretty definitively answers one question: “Will my body odour reduce at this specific date, time, and location after I scrub concentrated vinegar under my armpits with a paper towel?” However, that data alone doesn’t tell me whether it’s causal; it doesn’t tell me whether it works for other people; it doesn’t tell me whether it will work at a different time or place; it doesn’t tell me what the mechanism is. In order to answer these questions, I need some reason for believing that the specific data I’ve gathered has a Bayes factor not equal to one in relation to these questions.

In the vinegar example, I could infer causality because I had more than just the data. I had the data, a causal hypothesis, and no competing hypothesis. Together, this does imply causality, even when I wasn’t sure I was right about the underlying mechanism. The causal hypothesis doesn’t need to be a detailed mechanistic hypothesis (it might just be “vinegar will remove the smell”). As long as nothing else could have caused it, then you know what the cause is, even if you’re unsure of the underlying mechanics.

Let’s see how going from the data to the questions of interest can go wrong.

Randomised control trials can actually be incredibly poor evidence

From what I’ve seen, the people who scoff at anecdotes often decide literature debates by counting how many studies “support” a particular view. I hate this view, and how common it is. Truth is not democratic. Yes, fine, counting studies an okay rule of thumb; it’s a decent starting point. But we can do so much better.

I adjust my credences based on how well a study’s data actually answers the question of interest. Depending on the field, sometimes the relevant questions are barely answered at all, despite the paper’s authors’ claims to the contrary.

Did an RCT prove that increased consumption of saturated fat doesn’t increase cholesterol? (Nope.)

A study took a particular group of a people, and fed them one extra egg per day (on top of their regular diet). The group’s average cholesterol level did not increase by a statistically significant amount. The study’s author’s concluded that saturated fat has no effect on cholesterol. Reasonable, right? Well… no, it isn’t. If you think about it properly, the data doesn’t actually support the authors’ conclusion.

The main issue, is that by increasing saturated fat by a little, you actually only measure part of the curve that defines the relationship between saturated fat intake and blood-level cholesterol. Who said this relationship’s curve was linear? Turns out, it’s not linear. It’s more of a logarithmic shape. And the specific group of people that were selected by the study all had high blood-level cholesterol to begin with, i.e. they were all on the flat portion of the curve. So increasing saturated fat consumption by a little would have no effect, even though decreasing consumption by a lot would have a large effect.

(Why the scientists chose a bad study design is another question altogether that’s not relevant to what we’re interested in. If you must know, it was funded by the meat, dairy and egg industry. Why isn’t this a primary consideration in evaluating studies? Because industry funding doesn’t guarantee bad studies. Bad study design guarantees bad studies.)

And if you have a study which actually answers the question of interest i.e. “Will a person’s blood-level cholesterol go down if they consume less saturated fat”, the answer is absolutely “yes”. In fact we know the causal relation so well that there is a formula that accurately predicts changes blood-level cholesterol with one of the inputs being changes in saturated fat intake.

RCTs good, anecdotes bad?

Unfortunately, figuring out the truth is hard. There are so many things that can go wrong in the process. You should always be sceptical when someone has an easy answer, like “Just look at the RCTs” or “ignore the anecdotes”. But there is at least one simple bit of advice that’s always reliable: Ask yourself, “What would Bayes think about this data?”

nathan98000Apr 8 20225

Good post! Spencer Greenberg has a post with similar thoughts on this:

https://www.spencergreenberg.com/2021/05/is-learning-from-just-one-data-point-possible/

FCCCApr 9 20221

Wow, that essay explains strong anecdotes a lot better than I did. I knew about the low-variance aspect, but his third point and onwards made things even clearer for me. Thanks for the link!

Peter S. ParkMar 13 20224

They can be (deterministic Bayesian updating is just causal inference), but they can also not be (probabilistic Bayesian updating requires a large sample size; also, sampling bias is universally detrimental to accurate learning).

FCCCMar 13 20222

Yep, I agree.

Maybe I should have gone into why everyone puts anecdotes at the bottom of the evidence hierarchy. I don't disagree that they belong there, especially if all else between the study types is equal. And even if the studies are quite different, the hierarchy is a decent rule of thumb. But it becomes a problem when people use it to disregard strong anecdotes and take weak RCTs as truth.

Peter S. ParkMar 14 20222

I think so too! A strong anecdote can directly illustrate a cause-and-effect relationship that is consistent with a certain plausible theory of the underlying system. And correct causal understanding is essential for making externally valid predictions.

Effective Altruism Forum
EA Forum