I feel very warmly about using relatively quick estimates to carry out sanity checks, i.e., to quickly check whether something is clearly off, whether some decision is clearly overdetermined, or whether someone is just bullshitting. This is in contrast to Fermi estimates, which aim to arrive at an estimate for a quantity of interest, and which I also feel warmly about but which aren’t the subject of this post. In this post, I explain why I like quantitative sanity checks so much, and I give some examples.
Why I like this so much
I like this so much because:
- It is very defensible. There are some cached arguments against more quantified estimation, but sanity checking cuts through most—if not all—of them. “Oh, well, I just think that estimation has some really nice benefits in terms of sanity checking and catching bullshit, and in particular in terms of defending against scope insensitivity. And I think we are not even at the point where we are deploying enough estimation to catch all the mistakes that would be obvious in hindsight after we did some estimation” is both something I believe and also just a really nice motte to retreat when I am tired, don’t feel like defending a more ambitious estimation agenda, or don’t want to alienate someone socially by having an argument.
- It can be very cheap, a few minutes, a few Google searches. This means that you can practice quickly and build intuitions.
- They are useful, as we will see below.
Some examples
Here are a few examples where I’ve found estimation to be useful for sanity-checking. I mention these because I think that the theoretical answer becomes stronger when paired with a few examples which display that dynamic in real life.
Photo Patch Foundation
The Photo Patch Foundation is an organization which has received a small amount of funding from Open Philanthropy:
Photo Patch has a website and an app that allows kids with incarcerated parents to send letters and pictures to their parents in prison for free. This diminishes barriers, helps families remain in touch, and reduces the number of children who have not communicated with their parents in weeks, months, or sometimes years.
It takes little digging to figure out that their costs are $2.5/photo. If we take the AMF numbers at all seriously, it seems very likely that this is not a good deal. For example, for $2.5 you can deworm several kids in developing countries, or buy a bit more than one malaria net. Or, less intuitively, trading 0.05% chance of saving a statistical life for sending a photo to a prisoner seems like a pretty bad trade–0.05% of a statistical life corresponds to 0.05/100 × 70 years × 365 = 12 statistical days.
One can then do somewhat more elaborate estimations about criminal justice reform.
Sanity-checking that supply chain accountability has enough scale
At some point in the past, I looked into supply chain accountability, a cause area related to improving how multinational corporations treat labor. One quick sanity check is, well, how many people does this affect? You can check, and per here1, Inditex—a retailer which owns brands like Zara, Pull&Bear, Massimo Dutti, etc.—employed 3M people in its supply chain, as of 2021.
So scalability is large enough that this may warrant further analysis. One this simple sanity check is passed, one can then go on and do some more complex estimation about how cost-effective improving supply chain accountability is, like here.
Sanity checking the cost-effectiveness of the EA Wiki
In my analysis of the EA Wiki, I calculated how much the person behind the EA Wiki was being paid per word, and found that it was in the ballpark of other industries. If it had been egregiously low, my analysis could have been shorter, and maybe concluded that this was a really good bargain. If the amount had been egregiously high, maybe I would have had to dig in about why that was.
As it was, the sanity check was passed, and I went on to look at other considerations.
Optimistic estimation for early causes
Occasionally, I’ve seen some optimistic cost-effectiveness estimates by advocates of a particular cause area or approach (e.g., here, here, or here). One possible concern here is that because it’s the advocates that are doing this cost-effective estimates, they might be biased upwards. But even if they are biased upwards, they are not completely uninformative: they show that at least some assumptions and parameters, chosen by someone who is trying their best, under which the proposed intervention looks great. And then further research might reveal that the initial optimism is or isn’t warranted. But that first hurdle isn’t trivial.
Other examples
- You can see the revival of LessWrong pretty clearly if you look at the number of votes per year. Evaluating the value of that revival is much harder, but one first sanity check is to see whether there was some reviving being done.
- When evaluating small purchases, sometimes the cost of the item is much lower than the cost of thinking about it, or the cost of the time one would spend using the item (e.g., for me, the cost of a hot chocolate is smaller than the cost of sitting down to enjoy a hot chocolate). I usually take this as a strong sign that the price shouldn’t be the main consideration for those types of purchase, and that I should remember that I am no longer a poor student.
- Some causes, like rare diseases, are not going to pass a cost-effectiveness sanity check, because they affect too few people.
- If you spend a lot of time in front of a computer, or having calls, the cost of better computer equipment and a better microphone is most likely worth it. I wish I’d internalized this sooner.
- Raffles and lotteries (e.g., “make three forecasts and enter a lottery to win $300”, or “answer this survey to enter a raffle to win $500”) are usually not worth it, because they don’t reveal the number of people who enter, and it’s usually fairly high.
- etc.
Conclusion
I explained why I like estimates as sanity checks: they are useful, cheap, and very defensible. I then gave several examples of dead-simple sanity checks, and in each case pointed to more elaborate follow-up estimates.
Great post, Nuño!
Your 1st example about the Photo Patch Foundation reminded me of SoGive's shallow analyses, whose methodology here. I encourage people interested in practicing estimation to check them out.
To illustrate, here are the summaries of the 1st 3 I did during my SoGive volunteering back in 2021 (which were actually my 1st 3 EA-type analyses!):
Analysis of Royal Opera House:
Analysis of The Church Of Jesus Christ Of Latter Day Saints:
Analysis of The British Museum Trust:
Thanks Vasco, these are great. Though, where are you getting the depression baseline from?
Ah, sorry, I should have clarified. I used SoGive's Gold Standard Benchmark of 200 £ per year of severe depression averted. This was obtained surveying a sample of 500 nationally representative UK residents, 13 EAs, and SoGive's team (see details in the link). I suppose Ishaan will come up with a better estimate in the process of this.
Amazing I think this is a great (if fairly intuitive) concept, and I feel like this post might deserve more attention.
I think I do this quite a lot, but I haven't seen this crystallised so well before. I think we should all be sanity checking all the time.
I did have to sanity check one of your sanity checks though. Some "Neglected diseases" (as defined by the WHO) actually affect lots of people. E.g. Shistosomiasis infects something like 340 million people and might cause something like 2 million DALYs a year, which is hardly chicken feed ;)
Also am honoured (sort of) that you included my analysis of OneDay Health in your examples haha
Thanks Nick.
Yeah, I was thinking more like rare genetic diseases. Edited to say rare rather than neglected
I agree with the overall point that sanity-checking with estimation is a good idea, but I don't find the Photo Patch Foundation example very compelling for this point. $2.50/photo or 12 quality-adjusted days of life per photo seems acceptable to me given the long-term productivity benefits of improving the morale of both the incarcerated parent and the kid.
Yeah, I agree they seem acceptable/good on an absolute level (though as mentioned I think that much better interventions exist).