<sharing it here, too...>
~ a community of Bayes-enthusiasts fumble statistical inference ~
TL;DR — Industry uses Dirichlet Process and SAS, NOT Bayes. Bayes is persistently *wrong* and lacks a great deal of important information. Supposed ‘rationalists’ cling to Bayes as the Ultimate Truth, without knowing enough Mathematics to know they’re wrong.
“Oh, well my Prior was <preferred assumption> but I guess I have to update with that one data-point that wandered into my life.” — multiple ‘Rationalists’ in my year of invading their gatherings
A weird thing is happening in the Bay Area, slowly creeping into the Zeitgeist: a group of non-mathematicians have decided they found the BEST statistical technique ever, and they want to use it to understand the whole world… but their technique is 260 YEARS OLD, and we’ve done a LOT better since then. It’s called Bayes’ Theorem, published in 1763 — literally 260 candles this year.
Let’s get a sense of just how out-dated and bizarre it is, to insist you have the One-True-Method when it’s 260 years old: back in 1763, when Bayes was published, there was another new-fangled invention sweeping Europe — the Dutch Plough. That’s the plough used today by the Amish. Literally, relying on Bayes to draw conclusions is like farming with an Amish plough; it’s hilariously inadequate, and completely dismissed by industry.
That quote at the top is an amalgam of multiple conversations with the Effective Altruists and Astral Codex Ten ‘Rationalists’ (they made that term up to describe themselves); it’s a persistent theme in their conversations. And, it’s not even the *correct* use of Bayes! Let’s see why:
In Bayes’ Theorem, you begin with a Prior. These Rationalists pick the Prior that they *prefer*. Neutral Bayesian Priors, however, are the average of all possible assumptions, NOT you’re preferred place to start. These folks’ first step is a disastrous error. Then, when they say “I guess I should update my Prior…” Wait! Why in the world would you ever feel confidence about a belief, when the ONLY thing you have is a Prior? A Prior is, by definition, the state of “no information” when one should have intellectual humility, not certainty!
Then, they are updating their Bayesian estimate using…. a *few* examples? The Rationalists repeatedly rely upon sparse evidence, while claiming certainty, as if “Statistically Significant Sample Size” just isn’t a thing. Bayes doesn’t *need* statistically significance, apparently! Finally, those examples they use are culled from personal experience. I hope I don’t have to explain to anyone why we need to collect a random sample from representative sub-populations? The supposedly rational Bayes-fans fail on each possible count.
So, if they correct those mistakes, can they then rely on Bayes to find their precious truths? Nope. Bayes is consistently wrong, reliably. That’s why industry doesn’t use it. They’d lose money. Dirichlet lets them make money, because it works better. That’s a stronger proof, empirically, than all the rationalizations of their community’s prominent Bayes-trumpeters: a fiction writer and a psych councilor, both of whom lack relevant experience with statistical analysis software and techniques.
In particular, the blog of that psych councilor, “Astral Codex Ten” has a tag-line: it quotes Bayes’ Theorem, and follows by saying “all else is commentary.” Everyone who reads his blog, and who then DOESN’T check what statistical techniques are used in the real world, stays there as part of the community. They have self-selected for a community of people who call Bayes the be-all-end-all, all of them agreeing they’re right, and they don’t know that they’re horribly wrong… because they don’t check!
Think about this for a moment: if you state Bayes’ Theorem, and then claim “all else is commentary” while recommending readers use Bayes, you are implicitly claiming “NO further improvements in statistical analysis have occurred in the 260 years since Bayes was published; Student-t Distributions, Levi Distributions, they don’t even need to exist!” That’s the core tenet of the Bay Area Rationalists’ luminary, addicted to Bayes.
Wait, so why and how is Dirichlet such an improvement?
Let’s imagine you took a survey in some big city, and found (unsurprisingly) a majority Democrats — it was a 60/40 split, on the nose. That sample’s split is also the “maximum likelihood” for the potential Population. Said another way, “The real-world population which is most likely to give you a 60/40 sample is a 60/40 population.” But, does that make 60/40 your best guess for the real population? No.
Imagine each possible population, one at a time. There’s the 100% Democrat population, first — what is the *likelihood* of such a population producing a 60/40 sample? Zero. What about 99% Democrat? Well, then it’ll depend upon how *many* people you surveyed, but there is just a tiny chance the real population is 99% Democrat! Keep doing that, for every population, all the way to 99% Republican, then 100% Republican. Whew! Now, you have a *likelihood* distribution, the “likelihood of population X generating sample Y.”
When we look at this distribution, for data that falls in two buckets (D/R), then we’ll notice something: the *peak* likelihood is at 60/40, but there’s ALSO a bunch of probability-mass on the 50/50 side of the curve, creating a tilt to the over-all probability. While the ‘mode’ of the likelihood distribution is still the 60/40 estimate, the actual ‘mean’ of that distribution is closer to 50/50, every time! You *should* expect that the true population is closer to an *equal division* among buckets. When you collect more samples, you narrow that distribution of likelihoods, so you see less drift toward 50/50. That’s the reason you want a ‘statistically significant sample size’.
Let’s look at that other aspect Dirichlet possesses, which Bayes wholly lacks: Confidence!
When you look at the likelihood of each population, the chance of it producing your observed sample, you can also ask: “How far AWAY from our best guess would we need to place boundaries, such that we include 95% of the possible populations’ likelihoods within our bounds?” That’s called your Confidence Interval! You may have only learned the trimmed-down simplicities and z-score tables in your Stat 101 class, but there’s a reason for why they can claim confidence: that interval of population-estimates contains 95% of the likelihood-distribution’s probability-mass!
Finally, let’s consider “the cost of being wrong”. Bayes doesn’t balance your prediction according to the cost of being wrong; Dirichlet’s distribution over potential populations can simply be *multiplied* by the cost of each error-distance, and then the mode of that distribution will “minimize the COST of being WRONG.” You can even multiply by costs which are discontinuous or ranges, producing high and low bounds and nuanced thresholds of risk. Definitely better than Bayes.
Now, Dirichlet isn’t even the be-all-end-all… it was published in 1973, 50 years old THIS year! SAS has trade secrets since the 70’s, and invests 2.5x more into R&D than the TECH-industry average! If you want to pass muster for pharmaceuticals in front of the FDA, you send all your data to SAS. It’s required, because they’re soooo damn GOOD! So, unless you work at SAS (which has the highest profits per employee hour of all companies on Earth, and has expanded consistently since 1976… consistently rated one of the best employers on the planet…) then you DON’T know the be-all-end-all statistical technique — and neither do Scott Alexander or Eliezer Yudkowski, as much as they’d like you to believe otherwise. Just for reference, when “you think you’re right BECAUSE you don’t know enough to know you’re wrong,” that’s called the Dunning-Kreuger Effect, dear Rationalists.
While I expect some EAs and rationalists do actually use Bayes formally in their analysis, a lot of its use is informal, using language associated with Bayes to communicate an approach to updating beliefs.
From this informal perspective, clarity and conciseness matters far more than empirical robustness.
"From this informal perspective, clarity and conciseness matters far more than empirical robustness."
Then you are admitting my critique: "Your community uses excuses, to allow themselves a claim of epistemic superiority, when they are actually using a technique which is inadequate and erroneous." Yup. Thanks for showing me and the public your community's justification for using wrong techniques while claiming you're right. Screenshot done!
Why is it inadequate to use language associated with Bayes in an informal analysis? Are you suggesting that when people communicate about their beliefs in day-to-day conversation, they should only do so after using Dirichlet or another related process? Can you see how that is, in fact, extremely impractical? Can you see how it is rational to take into account the costs and benefits of using a particular technique, and while empirical robustness may sometimes be overwhelmingly important in some contexts, it is not always rational to use a method in some contexts such as if there are too high costs associated with using it?
Please keep taking screenshots! I'm sure you wouldn't want to mislead your audience by only showing part of the discussion out of context :)
You're welcome to side with convenience; I am not commanding you to perform Dirichlet. Yet! If you take that informality, you give-up accuracy. You become MoreWrong, and should not be believed as readily as you would like.
Oh, you entirely missed my purpose: I was sharing this with your community, as a courtesy. I publish on different newsletters online, and I wrote for that audience ABOUT your community. And, the fact that you're not interested in learning about Dirichlet, when it's industry-standard (demonstrating its superiority empirically, not with anecdotes you find palatable). So, no, I don't plan to present myself in a way you approve of, as a pre-requisite to you noticing that Bayes is out-dated by 260 years of improvements. Dirichlet, logically, would NOT have been published and adopted in 1973 and since, if it were in fact inferior to Bayes.
You evidence the same spurious assumptions and lack of attention to core facts - Dirichlet is an improvement, obviously, by coming along later and being adopted generally. I also addressed the key information which Dirichlet provides, which Bayes' Theorem is incapable of generating: a Likelihood Distribution across possible Populations, and the resultant Confidence Interval, as well as weighting your estimate to Minimize the Cost of being Wrong. Those are all key, valuable information that Bayes' Theorem will not give you on its own. When Scott Alexander claims "Bayes' Theorem; all else is commentary" he leaves-out critical, incomparable improvements in our understanding.