Hide table of contents

Over the years, I have learned many things that are rarely taught about doing cost-benefit or welfare analysis. Here are a few things that I often end up repeating when I mentor individuals or teams working on these kinds of projects:

A Point Estimate is Always Wrong

For any purpose other than an example calculation, never use a point estimate. Always do all math in terms of confidence intervals. All inputs should be ranges or probability distributions, and all outputs should be presented as confidence intervals.

Do not start with a point estimate and add the uncertainty later. From day one, do everything in ranges. Think in terms of foggy clouds of uncertainty. Imagine yourself shrinking the range of uncertainty as you gather more data.

This Google Sheets Template allows you to easily set up Monte Carlo estimations that turn probabilistic inputs into confidence-interval outputs.

Use Google Sheets

I have experience programming in half a dozen languages, including R. Sometimes they are useful or necessary for certain kinds of data analysis. But I have learned that for almost all cost-benefit analyses, it is best to use Google Sheets, for several reasons.

The main one is transparency. A cost-benefit or welfare analysis is a public-facing document, not an academic one. You should not use esoteric tools unless absolutely necessary. Anyone in your society with basic literacy and numeracy should be able to read over and double-check your work. When you are done and ready to publish, you make your Sheet visible to everyone, and add a link to it in your report. Then anyone can see what you did, and effortlessly copy your code to refine and extend it, or just play around with different priors and assumptions.

This transparency also helps improve results and correct mistakes as you are doing the work. The more people review your math, the better it will be. The number of people who are willing and able to look over a spreadsheet is orders of magnitude higher than the number of people who can debug your code. Spreadsheets allow you to have meetings with your boss or subject matter experts where you walk them through the calculations.

Also, this is probably a group project. Online sheets make collaboration frictionless. Everyone can easily collaborate without passing code files back and forth. 

Finally, unless you have truly top-tier customized workflow and tooling, spreadsheets really are the best programming language for this kind of work. This is because they display all of the numbers at once, immediately. You see all of the intermediate calculations in a way that you almost never do when using loop-based languages, which helps you keep track of things. And anything you do causes everything to update instantly, so you can rapidly get a feel for how sensitive the outputs are to various changes in the inputs. Spreadsheets let you really grok and master the data and calculations, instead of seeing them as the output of a magic box.

In this kind of work, there is almost never a need to optimize on processor cycles. And there isn't really much difference between 1,000 Monte Carlo simulations and 100,000.

Good programming practices still apply:

  1. Never hardcode any number into any equation. Not even the number of days in the year. All equations take all inputs from other cells.
  2. Show and label every step of the calculation.
  3. You can name cells and use the names as global variables.
  4. Cite every number you use, with a link to the source. (This is another thing that makes spreadsheets better than other languages. You can and should put citations with urls in comments, but they are not native hyperlinks.)
  5. You can do version control by copying the sheet and adding a version number to the new copy when making a significant change. (Put a link to the new sheet in the old one and tell the team what you did.) But the system automatically saves almost everythng and allows you to roll back to a previous version if necessary.

One Individual, One Event

Work and think and present results at the smallest possible level, ideally one individual and one event. Do not default to trying to create a summation of a large system or population. This has several benefits:

  1. This takes you out of the abstract and into the concrete, and forces you to confront the messy complexity of reality. It will make your estimations much better, by grounding them in something that you can properly visualize and have a better gears-level understanding of.
  2. It makes it easier to extend your work to new situations. You have provided people with solid bricks that they can use for other things.
  3. It makes better stories and talking points. People's eyes glaze over at millions and billions of dollars worth of monetized QALYs. They will pay attention to a memorable fact about the value of intervening in one person's life.
  4. It helps guide individual action, if government action fails. If people learn that an intervention is expected to produce $100-$400 of value for someone in their situation, they can make intelligent choices about what kinds of costs they should pay to do the thing.
  5. If you try to do the summation before gaining a very good understanding of the core thing you are adding up, you are much more likely to mess up the summation. The final number will often be at least three nested levels of summations and derivatives, and you will fail badly if you do not start at the smallest level and work though the summations very carefully.

Minimum Viable Product and Iterate

Aside from a few specialized subfields where things are very routine, every cost-benefit analysis is a new adventure. You have no idea what kinds of data are available or how hard it will be to collect them. So you need to start with the smallest, simplest analysis, with the fewest complications. Even if you have months, start by doing the best you can in a single day of typing things into Google Scholar. Work out the costs and benefits of one intervention for one individual in the simplest situation with the fewest complications.

This will give you a sense of the field, and what kinds of data are easy and hard to obtain. And it often serves as a lower or upper bound on the real analysis. Over time, you can add complications, and do research to narrow the confidence intervals on your initial rough guesses.

This procedure means that you always have something to present. You will never fail to provide a guess that improves people's knowledge. You can usually give a useful intermediate calculation in your progress reports. The worst that can happen is that your confidence interval will be wider than you would like, and/or you have not worked out an aggregate total. Even if your guess is limited to an unrealistic condition, that's usually more information than we had before, and it can help calibrate people's intuitions about which policies are most important to pursue. 

Just be sure to maintain enough mindfulness and scout-mindset to be willing to heavily modify or even throw away your early guess as new data comes in.

Deliver What the Customer Needs

Every analysis exists for a reason. It is a communications tool, designed to achieve an objective. As you iterate, make sure that you are iterating towards something that helps the customer. Understand why they want, why they want it, and what they think a good job looks like.

You should always maintain professional ethics and honesty. Never lie. However, there is a huge difference between 'academic standards' and 'professional ethics'. The customs of your academic discipline are not the laws of honesty. There are a vast number of true and helpful things you could say about the topic, and a multitude of non-misleading ways of presenting your results, so you should choose the ones that the client needs. 

Be flexible about anything other than truth, and understand that academic pedantry and caveats are not truth. The audience of the report will probably not be specialists or analytic philosophers, they will be normal people using normal linguistic conventions, which assume fuzziness and caveats in everything. Presenting a simple, plain-language conclusion that is true 95% of the time will, if people remember and act on it, add more truth to the world than an exquisitely accurate statement that nobody outside your discipline can read.

If a lawyer or PR person ever tells you not to use a word, listen to them. There will always be synonyms that have the same meaning but will cause less trouble. Your goal is to teach people about reality, and people don't learn when they are triggered or on tilt, so avoid the cursed words that shut down rational thought. The same thing applies to tone and style. Something that feels inferior to you and your social group might do a better job of helping the target audience better navigate reality.

I will even go as far as to present point estimates if someone makes a good case for why they are needed to communicate well in a particular circumstance. I will fight for my precious confidence intervals, and try my best to educate the client, but I will only fight so much. There are ways of making point estimates carry truth, in ways that lessen people's ignorance, and I will use them if I must.

Also, keep in mind that the analysis will never be 'done' or 'finished' or 'right'. You can add an infinite amount of effort to any analysis to make it better and narrow your uncertainty. It will always be incomplete. Your job is to do the most important parts first, and then keep working until the marginal value of additional refinement, to your customer, is less than the explicit or implicit rate you are billing them (or your personal opportunity cost if you are working pro bono).

Ruthlessly Simplify Literature

If you are lucky, you will be able to draw on a beautiful body of literature, one that is vast and rich. You must resist the temptation to get nerd-sniped, or you will never get finished. Extract the number you need, in the form of a confidence interval that takes into account the relevant uncertainty, and move on.

Very often a literature review will lead to a conclusion like "This number could be anywhere from 10 to 80. An expert in the field can predict it to ± 3, based on a systemic analysis of dozens of variables." Unless the number is a truly key input (one that a lot of things are multiplied by, and the biggest remaining source of uncertainty), don't try to incorporate this complexity into your calculation. Just enter 'Uniform distribution, 10 to 80' into the Monte Carlo and move on to the next literature review for the next value. If you have time later, you might be able to shrink it based on a more nuanced understanding of the literature, but that should look more like 'Uniform distribution, 15 to 40' than trying to model everything.

Do Longform NPV

Academics are never taught the financial calculations that are necessary to do a cost-benefit analysis. Their biggest mistakes usually come from an improper Net Present Value calculation. For this reason, you need to do them 'by hand' until you have a good intuition for how they work.

It is often messy or difficult to do longform NPV calculations on a confidence interval. You can find the confidence interval for the core value first, and then do the NPV summation on the bounds of that interval. Use a new blank tab of the sheet (so you can work in a grid, rather than the linear form needed for the Monte Carlo sheet).

Link to the earlier calculation result, and label it as this year's value (t=0). On a new line, display next year's value (t=1). Usually, at first, this is the same as the first year's value. Then calculate its discounted value in a different cell, based on the time value. Do the math yourself [value / (1 + rate)^time] rather than using any prebuilt formulas. Then repeat in a new line for the next year, and the next, etc. Finally do a summation of all the years. Also separately do a summation of the first 10 years and the first 20 years, if your default timeframe is longer.  Then stare at the numbers for a while so you get a sense of how they relate to each other. Then change the discount rate and look at the new numbers.

After you have a good intuition for what the numbers should look like, then you can sometimes use an NPV formula. But often, the situation you are modeling involves complications or different growth rates that make the default formula invalid, so you have to go back to the longform calculation and make adjustments.

You will often generate two layers of nested NPV calculations. For example, assume that a long-lasting vaccine gives annual benefits of $2000-$6000 to a high-risk person who gets it. To find the total value of vaccinating that person, you will need to do an NPV calculation for their future years of being protected. Then, to find the value of one year of a program, you add up the values of all of the people in various situations that will be vaccinated in a year. Then you do another NPV on that sum to find the total expected value of a full multi-year vaccination program. Then, if your intervention is something like funding research, the value of your intervention is some fraction or other derivative of the total expected value of a working vaccination program. If you are not very meticulous, carefully labeling and commenting and thinking about every step of this, you will be wrong.

Consider the Pair Programming methodology here. With every calculation, have a team member looking over your shoulder and asking what you are doing and why. At the very least, you should have a presentation to your boss or client or other stakeholders where you walk through each step of the calculation, making sure that each intermediate number is a well-explained talking point.

Breadth Beats Rigor

Your policy or action will change the world in hundreds of ways. Having a half-decent guess at the magnitude of 70 of those changes is much better than an academically-rigorous estimate of 10 of them. 

After you do the simple and obvious minimum analysis, the next step is to brainstorm all of the ways that the world will change from the baseline. Then do some napkin math on all of those changes to guess at their highest and lowest plausible effects. Then focus your analytical work on those with a confidence interval that includes the largest numbers.

Use Multiple Specifications

One of the best epistemological habits of economists is that they expect papers to present multiple specifications. In any good econ paper, there is a table showing the result of at least 6 specifications, usually starting with a simple OLS and then working up to more complicated regressions that account for more things. These serve as a kind of sanity check, and a way to show that the author is less likely to be p-hacking or data-mining. 

You should do something similar. Report the output of many different calculation methods, based on different assumptions or levels of methodological complexity. This communicates your rigor, honesty, and uncertainty to the audience. If things go well, then several completely different methods of arriving at the number, using different data sources and assumptions, will yield roughly similar confidence intervals.

But also, you can combine these specifications to produce a comprehensive uncertainty range. The beautiful and powerful thing about Monte Carlo estimation is that it allows you to include all kinds of uncertainty, including worldview and methodological uncertainty. You never have to make a hard choice. When in doubt, do everything, and then each iteration of the simulation chooses one of the calculation methods at random according to a credence weight. (Do not average them together. That destroys data and generates fake certainty.)

There is no standard methodology for assigning credence to the various options or methods. You have to do what feels reasonable, based on your expert judgment of how the different maps probably relate to the territory. But this is much more honest and transparent than only presenting one method and pretending that the others do not exist. If you are capable of choosing one thing from a menu of options, then you are also capable of assigning weights to every option and using those weights in the simulation.

Become or Hire a Journalist

By 'journalist' I mean an old-school investigative reporter who tracks down cagey sources, gets them to agree to talk, and extracts the truth from people who have no incentive to tell it. When doing a cost-benefit analysis of a new policy or regulation, you usually end up in territory where journalistic methods of truth-seeking are more effective and accurate then academic methods. 

When doing a cost-benefit analysis, you need to know how much it will cost to do a new thing. Not now much an academic thinks it should cost under ideal frictionless conditions, but how much it will really cost. You will not find this data in the published literature, and you will not find it by doing a survey. You must talk to informants who have experience doing the thing in real-world conditions.

Very often, there are only a couple dozen people in the world who have the knowledge to accurately estimate the relevant costs. They are usually people with management or operational responsibilities at for-profit companies, which means that their time is very valuable and they are hard to track down and talk to. Furthermore, they have an incentive to mislead you, because they will avoid a lot of hassle if they can present data that will kill the regulation.

Most academics lack the skills to navigate this territory well, and it requires a change in mindset. When an academic reads a sentence like "We conducted structured interviews of five C-level executives at firms who would be affected by the rule" their typical reaction is "N=5? That's garbage data". But when a policy-maker or politician reads that sentence, their thoughts will be more like "Huh, they actually took the effort to talk to a key constituency and get some buy-in. These are responsible adults who can be trusted with power."

It is truly shocking how many cost-benefit analyses are done without talking to a single responsible or knowledgeable person who would be affected by the regulation. You can do better. But you will not be getting those interviews unless you competently do journalist-type things.

Here are some resources on interviews. Notice the convergent evolution in 'health policy key informant interview guidelines' and 'how to be a good journalist' and especially focus on the common themes:

https://healthpolicy.ucla.edu/sites/default/files/2023-08/tw_cba23.pdf

https://ijnet.org/en/story/12-dos-and-donts-journalistic-interviews

Of course, you should not simply trust what your informants say. Ex-ante estimates of regulatory burden are usually higher than the actual burden, even when people are trying to be honest, because they fail to anticipate process innovation. (Although the ex-post burden is skewed, and some regulations cost much more than estimated.) You should also do the 'estimate what costs would be in an ideal frictionless world' math, and treat these as two bounds of an uncertain distribution. And look for ex-post estimates of the burden of similar policies.

Even if you have good alternate sources for cost data, talking to people responsible for complying with the regulation is still vital, because they will often tell you about categories of costs or side effects that you did not know about. Usually these are related to learning, training, monitoring, verification, corrective action, and frictions imposed on seemingly-unrelated processes, but there are always surprises and new things you need to consider and add in.

In most cases, if your team is more than two people, the third person should devote their time to obtaining and arranging interviews. (The others can and should join the call, if possible, and may ask most of the questions, but it takes a lot of work to secure and arrange the interview.) This will produce more value than a third analyst doing science-type things.

If you have contracting or hiring authority, consider bringing on someone with the right kind of reporting experience, rather than trying to turn an analyst into a journalist. Finding and interviewing sources is a task that requires charisma and persistence and skills that good journalists learn, like going to industry conferences and chatting up people in the elevator, and not the skills and traits that academics are selected for.

And as a bonus, you can have the journalist write the final report. They will probably write in language that is much more readable and impactful than the academics would use.

Time Costs Usually Dominate

Most of the costs of most heath-related policies are consuming people's time, rather than accounting costs. Sometimes these people are being paid for their time, but usually they are not. If you only look at accounting costs, you could easily underestimate the true social costs by an order of magnitude. Consider everyone who is affected, and estimate their time cost of the policy.

When looking for people to interview and thinking of what questions to ask, do not look for ‘people who estimate costs’ and do not focus on accounting costs. The true costs of policies and regulations are mostly not legible to accountants; they are seen by managers who have a feel for how much time is consumed. Basically all of your expert elicitation and interview questions will be of the form of ‘How much time will it take a person to do tasks X, Y, and Z?’ Then you multiply by the relevant salary/wage and overhead rate to produce the cost estimate.

Analyses are Worthless, but Analyzing is Everything.

There's a military-adjacent saying often attributed to Eisenhower: "Plans Are Worthless, But Planning Is Everything". No battle plan ever survives contact with the enemy, and yet the habit of making battle plans is one of the main things that separates Great Powers from nations that get swept into the dustbin of history. Pondering his fact leads to good insights about the relative value of product versus process, and how to maintain a community of expertise.

Something similar is true of your cost-benefit analysis. It will probably be wrong. Your final number will, in many important ways, be fake. But when the analysis is done right, it makes you much more competent to say things about the policy, ideally to improve it and guide its implementation. It is a strong signal that you have properly done your due diligence and that you are a responsible actor.

Policy-makers who have experience dealing with academics know that many academic results are garbage. They know that people often use academic language to push an agenda or manufacture consent. They will not trust your bottom-line number, nor should they. But if you have done a good job, the main message they they will get from the report is that you have made an intellectually honest effort to learn the details of the policy, and to think about what will actually happen when it collides with reality. 

Policy-makers with positions of responsibility in democratic countries do not really care about DALYs. They do want to be able to tell a story about how they made things better, but mainly they want to avoid nasty surprises. They want to know the likely side effects and what kind of trouble might pop up. They want to know who will be harmed by the policy, and how much those people are likely to fight. If you can give them a decent guide to these things, they will be more willing to trust you.

Ask for Help

Talk to people who have done this kind of work before. Cold-email academics who have published related work that you find helpful, especially if the work has a vibe of being policy-focused or grounded in reality, and arrange a call where you present your results to them and ask if they think your work is reasonable. Most academics are desperate for validation that their work matters and is being listened to.

I am almost always in a position to advise EAs who are working on these kinds of analyses. My email is bruns@jhu.edu.

Comments18
Sorted by Click to highlight new comments since:

This is my favorite in the series so far. I really enjoy the tacit knowledge flavor of it and that some of these lessons generalize beyond the CBA domain.

Thanks! That was what I was hoping for. I've learned things since I started this series, and one of the main ones was to be less academic and more practical.

Google Sheets also has a criminally under appreciated tool - Google App Script

You can write custom code, as Javascript, and have it run on a schedule, or as a function in a spreadsheet cell, or as a clickable menu item

To add to this, combining this with LLMs is very powerful. If you describe the structure of your sheets and how you want them manipulated to ChatGPT, it will (in my experience) output hundreds of lines of code that will work on the first try.
This has turned me from Just Some Guy into a capable programmer at work, it's crazy.

Nice, Thanks!

I do a lot of modelling in my job, and I have to say this is the best tacit knowledge piece I've read on modelling in a while (the MC gsheet template is a nice bonus too). Bookmarked for (I expect) frequent future reference. Thanks Richard. 

Thanks. What are some other good ones you have read?

On the more practical side, froolow's A critical review of GiveWell's 2022 cost-effectiveness model. GiveWell's CEA spreadsheets now are a lot better in many ways than back then, when they had the same kinds of model design and execution issues as the ones I used to see in my previous day job managing spreadsheet-based dashboards to track management metrics at a fast-growing company full of very bright inexperienced young analysts — this part resonated with my daily pain, as a relative 'non-genius' versus my peers (to borrow froolow's term):

It is fairly clear that the GiveWell team are not professional modellers, in the same way it would be obvious to a professional programmer that I am not a coder (this will be obvious as soon as you check the code in my Refactored model!). That is to say, there’s a lot of wasted effort in the GiveWell model which is typical when intelligent people are concentrating on making something functional rather than using slick technique. A very common manifestation of the ‘intelligent people thinking very hard about things’ school of model design is extremely cramped and confusing model architecture. This is because you have to be a straight up genius to try and design a model as complex as the GiveWell model without using modern model planning methods, and people at that level of genius don’t need crutches the rest of us rely on like clear and straightforward model layout. However, bad architecture is technical debt that you are eventually going to have to service on your model; when you hand it over to a new member of staff it takes longer to get that member of staff up to speed and increases the probability of someone making an error when they update the model.

Angelina Li's Level up your spreadsheeting (longer version: Level up your Google Sheets game) is great too, and much more granular. I would probably recommend their resource to most folks for spreadsheeting in general, and yours for CBAs more specifically.

On the "how to think about modelling better more broadly" side, Methods for improving uncertainty analysis in EA cost-effectiveness models, also by froolow, is one I think about often. I don't have a health economics background, so this argument shifted my perspective:

Uncertainty analysis is a major omission from most published EA models and seems to me like the proverbial ‘hundred dollar bill on the sidewalk’ – many of the core EA debates can be informed (and perhaps even resolved) by high-quality uncertainty analysis and I believe this could greatly improve the state of the art in EA funding decisions.

The goal of this essay is to change the EA community’s view about the minimal acceptable standard for uncertainty analysis in charity evaluation. To the extent that I use the GiveWell model as a platform to discuss broader issues of uncertainty analysis, a secondary goal of the essay is to suggest specific, actionable insights for GiveWell (and other EA cost-effectiveness modellers) as to how to use uncertainty analysis to improve their cost-effectiveness model.

This contributes to a larger strategic ambition I think EA should have, which is improving modelling capacity to the point where economic models can be used as reliable guides to action. Economic models are the most transparent and flexible framework we have invented for difficult decisions taken under resource constraint (and uncertainty), and in utilitarian frameworks a cost-effectiveness model is an argument in its own right (and debatably the only kind of argument that has real meaning in this framework). Despite this, EA appears much more bearish on the use of economic models than sister disciplines such as Health Economics. My conclusion in this piece is that there scope for a paradigm shift in EA modelling before which will improve decision-making around contentious issues.

This too, further down (this time emphasis mine): 

There is probably no single ‘most cost-effective use of philanthropic resources’. Instead, many people might have many different conceptions of the good which leads them to different conclusions even in a state of perfect knowledge about the effectiveness of interventions [1]. From reading the forums where these topics come up I don't think this is totally internalised - if it was totally internalised people would spend time discussing what would have to be true about morality to make their preferred EA cause the most cost-effective, rather than arguing that it is the actual best possible use of resources for all people [2].

Insofar as the GiveWell model is representative, it appears that resolving 'moral' disagreements (e.g. the discount rate) are likely to be higher impact than 'factual disagreements' (e.g. the effectiveness of malaria nets at preventing malaria). This is not unusual in my experience, but it does suggest that the EA community could do more to educate people around these significant moral judgements given that those moral judgements are more 'in play' than they are in Health Economics. Key uncertainties which drive model outputs include:

  • What should the discount rate for life-years and costs be? (And should it be the same for both?)
  • What is the ratio at which we would trade life-years for consumption-doublings?
  • How could we strengthen our assumptions about charity level adjustments?
  • How risk-averse should we be when donating to a charity with both upside and downside risk?

This is fantastic thank you! Have already sent it to someone considering dong a CBA

"For any purpose other than an example calculation, never use a point estimate. Always do all math in terms of confidence intervals. All inputs should be ranges or probability distributions, and all outputs should be presented as confidence intervals."

I weakly disagree with this "never" statement, as I think there is value in doing basic cost-benefit analysis without confidence intervals, especially for non mathsy indivuals or small orgs who want to look at potential cost effectiveness of their own or other's interventions. I wouldn't want to put some people off by setting this as a "minimum" bar. I also think that simple "lower and upper bound" ranges can sometimes be an easier way to do estimates, without strictly needing to calculate a confidence interval. 

In saying that when, big organisations do CBA's to actually make decisions or move large amounts of money, or for any academic purpose then yes I agree confindence intervals are what's needed!

I would also say that for better or worse (probably for worse) the point estimate is by far the most practically discussed and used application of any CBA so I think its practially important to put more effort into getting your point estimate as accurate as possible, then it is to make sure you're range is accurate.

Nice job again.

I agree with him on inputs, but often the expected value is the most important output, in which case point estimates are still informative (sometimes more so than ranges). Also, CIs are often not the most informative indicator of uncertainty; a CEAC, CEAF, VOI, or p(error) given a known WTP threshold is often more useful, though perhaps less so in a CBA rather than a CEA/CUA.

If you have a sophisticated audience, sure. But outside a few niche subfields, very few people working in policy understand any of those things. My advice is for dealing with an audience of typical policy makers, the kind of people that you have to explain the concept of 'value of statistical life' to. If you give them an expected value, they will mentally collapse it into a point estimate with near-zero uncertainty, and/or treat it as a talking point that is barely connected to any real analysis.

Thanks!

You are right, and I did walk it back a little in the "Deliver what the customer needs" section.

But you would have been more right before I posted the link to that Google Sheets Monte Carlo template. Before I had that, I would often default to a point estimate for very fast jobs. But this tool makes it a lot easier to start with the Monte Carlo to see if it works and is accepted.


 

Ha I love this I will definitely check that simulator out nice one!

Agree with others. This is fantastic. Learned a lot.

Agree with others. This is fantastic. Learned a lot.

Executive summary: Effective cost-benefit analysis requires probabilistic thinking, transparency, iteration, and clear communication tailored to the audience, with an emphasis on real-world costs and investigative methods.

Key points:

  1. Use Probabilistic Thinking – Always work with confidence intervals and probability distributions rather than point estimates.
  2. Prioritize Transparency and Accessibility – Google Sheets enhances collaboration, reviewability, and comprehension for a broad audience.
  3. Start Small and Iterate – Begin with a simple, minimal analysis and refine based on available data and stakeholder needs.
  4. Investigate Real-World Costs – Use journalistic methods to gather practical cost estimates from industry experts rather than relying solely on academic sources.
  5. Tailor Communication to Decision-Makers – Focus on clear, actionable insights rather than academic rigor, emphasizing practical impact over technical precision.

 

 

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

Thanks for the post, Richard.

For any purpose other than an example calculation, never use a point estimate. Always do all math in terms of confidence intervals. All inputs should be ranges or probability distributions, and all outputs should be presented as confidence intervals.

I have run lots of Monte Carlo simulations, but have mostly moved away from them. I strongly endorse maximising expected welfare, so I think the final point estimate of the expected cost-effectiveness is all that matters in principle if it accounts for all the considerations. In practice, there are other inputs that matter because not all considerations will be modelled in that final estimate. However, I do not see this as an argument for modelling uncertainty per se. I see it as an argument for modelling the considerations which are currently not covered, at least informally (more implicitly), and ideally formally (more explicitly), such that the final point estimate of the expected cost-effectiveness becomes more accurate.

That being said, I believe modelling uncertainty is useful if it affects the estimation of the final expected cost-effectiveness. For example, one can estimate the expected effect size linked to a set of RCTs with inverse-variance weighting from w_1*e_1 + w_2*e_2 + ... + w_n*e_n, where w_i and e_i are the weight and expected effect size of study i, and w_i = 1/"variance of the effect size of study i"/(1/"variance of the effect size of study 1" + 1/"variance of the effect size of study 2" + ... + 1/"variance of the effect size of study n"). In this estimation, the uncertainty (variance) of the effect sizes of the studies matters because it directly affects the expected aggregated effect size.

Holden Karnofsky's post Why we can’t take expected value estimates literally (even when they’re unbiased) is often mentioned to point out that unbiased point estimates do not capture all information. I agree, but the clear failures of point estimates described in the post can be mitigated by adequately weighting priors, as is illustrated in the post. Applying inverse-variance weighting, the final expected cost-effectiveness is "mean of the posterior cost-effectiveness" = "weight of the prior"*"mean of the prior cost-effectiveness" + "weight of the estimate"*"mean of the estimated cost-effectiveness" = ("mean of the prior cost-effectiveness"/"variance of the prior cost-effectiveness" + "mean of the estimated cost-effectiveness"/"variance of the estimated cost-effectiveness")/(1/"variance of the prior cost-effectiveness" + 1/"variance of the estimated cost-effectiveness"). If the estimated cost-effectiveness is way more uncertain than the prior cost-effectiveness, the prior cost-effectiveness will be weighted much more heavily, and therefore the final expected cost-effectiveness, which integrates information about the prior and estimated cost-effectiveness, will remain close to the prior cost-effectiveness.

It is still important to ensure that the final point estimate for the expected cost-effectiveness is unbiased. This requires some care in converting input distributions to point estimates, but Monte Carlo simulations requiring more than one distribution can very often be avoided. For example, if "cost-effectiveness" =  ("probability of success"*"years of impact given success" + (1 - "probability of success")*"years of impact given failure")*"number of animals that can be affected"*"DALYs averted per animal-year improved"/"cost", and all these variables are independent (as usually assumed in Monte Carlo simulations for simplicity), the expected cost-effectiveness will be E("cost-effectiveness") = ("probability of success"*E("years of impact given success") + (1 - "probability of success")*E("years of impact given failure"))*E("number of animals that can be affected")*E("DALYs averted per animal-year improved")*E(1/"cost"). This is because E("constant a"*"distribution X" + "constant b") = a*E(X) + b, and E(X*Y) = E(X)*E(Y) if X and Y are independent. Note:

  • The input distributions should be converted to point estimates corresponding to their means.
    • You can make a copy of this sheet (presented here) to calculate the mean of uniform, normal, loguniform, lognormal, pareto and logistic distributions from 2 of their quantiles. For example, if "years of impact given success" follows a lognormal distribution with 5th and 95th percentiles of 3 and 30 years, one should set the cell B2 to 0.05, C2 to 0.95, B3 to 3, and C3 to 30, and then check E("years of impact given success") in cell C22, which is 12.1 years.
    • Replacing an input by its most likely value (its mode), or one which is as likely to be an underestimate as an overestimate (its median) may lead to a biased expected cost-effectiveness. For example, the median and mode of a lognormal distribution are always lower than its mean. So, if "years of impact given success" followed such distribution, replacing it with its most likely value, or one as likely to be too low as too high would result in underestimating the expected cost-effectiveness.
  • The expected cost-effectiveness is proportional to E(1/"cost"), which is only equal to 1/E("cost") if "cost" is a constant, or practically equal if it is a fairly certain distribution compared to others influencing the cost-effectiveness. If "cost" is too uncertain to be considered constant, and there is not a closed formula to determine E(1/"cost") (there would be if "cost" followed a uniform distribution), one would have to run a Monte Carlo simulation to compute E(1/"cost"), but it would only involve the distribution of the cost. For uniform, normal and lognormal distributions, Guesstimate would do. For other distributions, you can try Squiggle AI (I have not used it, but it seems quite useful).

Several points:

  1. Doing the Monte Carlo using my sheet is easier than the method you presented for avoiding the Monte Carlo. It presents the mean, which is the expected value, and also the confidence interval.
  2. There are some audiences that already understand uncertainty and have a SBF-style desire to only maximize expected utility. These audiences are rare. Most people need to be shown the uncertainty (even if they do not yet know they need it).
  3. Some people will want or need to take the 'safe option' with a higher floor rather than try to maximize the expected value.
  4. When done right, the confidence interval includes uncertainty in implementation. If it is done by an A-team that gets things right, you will get better results. Knowing the possible range is key to know how fragile the expected result is and how much care will be required to get things right.
Curated and popular this week
Relevant opportunities