LD

Laura Duffy

Researcher @ Rethink Priorities
808 karmaJoined Working (0-5 years)Washington, DC, USA

Bio

I am a Researcher at Rethink Priorities, working mostly on cross-cause prioritization and worldview investigations. I am passionate about farmed animal welfare, global development, and economic growth/progress studies. Previously, I worked in U.S. budget and tax policy as a policy analyst for the Progressive Policy Institute. I earned a B.S. in Statistics from the University of Chicago, where I volunteered as a co-facilitator for UChicago EA's Introductory Fellowship. 

Comments
21

Hi Henry! The reason why the intervals are so wide is because they're mixing together several models. I've explained more about this modeling choice and result here: https://forum.effectivealtruism.org/posts/rLLRo9C4efeJMYWFM/welfare-ranges-per-calorie-consumption?commentId=Wc2xksAF3Ctmi4cXY

Hi Vasco, 

Thanks for this interesting post, and in general for the amount of time and consideration you’ve given to analyzing animal welfare issues here on the Forum. I want to reiterate the points others in this comment section, and urge you to consider much more explicitly the wide range of uncertainty involved in asking a question like this. In particular, the following model choices are in my opinion deserving of a more careful uncertainty treatment in your analysis:

  • The probability of sentience and welfare capacities of mosquitoes. 
    • This may be substantively much different than that of black soldier flies, whose sentience and welfare capacities we are also deeply uncertain about.
  • The duration of different types of pain experienced by mosquitoes as they die, conditioned on them having valenced states. 
    • I think we should be much more uncertain about the accuracy of the GPT pain track tool at capturing the true experiences of mosquitoes dying from ITNs. The estimates included in your spreadsheet vary quite a lot. 
  • The number of mosquitoes killed per hour by a net 
  • The weight you should give to excruciating pain relative to longer, less-intense suffering and, to a lesser extent, the weight you give to disabling pain relative to a DALY. 
    • I think it’s an open question, which is quite relevant to this analysis, how pain intensity scales with duration. 
  • Whether the suffering experienced by those who have fatal cases of malaria is accurately accounted for in the analysis. 
    • In particular, pain-track estimates and GBD DALY estimates are different tools for comparing suffering. When I’ve combined/compared them in previous reports, I did so mainly by calibrating the weights for harmful and disabling pain that’s experienced by animals on a more routine basis to the descriptions of different maladies analysed by the GBD. The weight I set on excruciating pain was not high enough to overwhelm the weight given to harmful and disabling pain. 
    • But I think that comparing these two methodologies might break down when you’re giving extremely high weight to short-lived extreme pain since it’s less clear how such pain is incorporated into the DALY estimates. One would need to do a pain-track analysis for the suffering experienced if one has a fatal case of malaria for a true apples-to-apples comparison. I think this would be a fruitful area for more research. 
  • The counterfactual life outcomes and welfare experienced by mosquitoes

Though you mention there is uncertainty in each of these variables, I think that it’s important to consider how they multiplicatively add up when combined and their aggregate effect on the range of plausible results. Otherwise, there’s a good risk of arriving at a directionally incorrect conclusion that can have big consequences if we act too quickly on it. This, in my view, is especially true if you’re bringing a set of controversial assumptions to bear on a sensitive and morally important topic. 

Hi Vasco, thanks for the question. 

Even though we ourselves are skeptical of the neuron count theory, many people in EA do put significant credence on it. As such, we chose to present the results that includes the neuron count model in this particular diagram.  Additionally, the differences between the results including and excluding the neuron count model are small. As we've mentioned in this post, our estimates are not meant to be precise -- rather, we think that order-of-magnitude comparisons are probably more appropriate given our significant uncertainty in theories of welfare and how best to represent them in a model.

Hi Vasco, 
Thanks for the good question! I think it's important to note that there are (at least) 3 types of model choices and uncertainty at work:
a) we have a good deal of uncertainty about each theory of welfare represented in the model,
b) we don't have a ton of confidence that the function we included to represent each theory of welfare is accurate (especially the undiluted experiences function, which partially drives the high mean results),
a) we could have uncertainty that our approach to estimating welfare ranges in general is correct, but we've not included this overall model uncertainty.  For instance, our model has no "prior" welfare ranges for each species, so the distribution output by the calculation entirely determines our judgement of the welfare range of the species involved. We also might be uncertain that simply taking a weighted mixture of each theory of welfare is a good way to arrive at an overall judgement of welfare ranges. Etc. 

Our preliminary method used in this project incorporates model uncertainty in the form of (a) by mixing together the separate distributions generated by each theory of welfare, but we don't incorporate model uncertainty in the ways specified by (b) or (c). I think these additional layers of uncertainty are epistemically important, and incorporating them would likely serve to "dampen" the effect that the mean result of the model affects our all-things-considered judgement about the welfare capacity of any species. Using the median is a quick (though not super rigorous or principled) of encoding that conservatism/additional uncertainty into how you apply the moral weight project's results in real life. But there are other ways to aggregate the estimates, which could (and likely would) be better than using the median. 
 

Seconding this question, and wanted to ask more broadly: 

A big component/assumption of the example given is that we can "re-run" simulations of the world in which different combinations of actors were present to contribute, but this seems hard in practice. Do you know of any examples where Shapley values have been used in the "real world" and how they've tackled this question of how to evaluate counterfactual worlds?

(Also, great post! I've been meaning to learn about Shapley values for a while, and this intuitive example has proven very helpful!)

Hi Michael, here are some additional answers to your questions: 

1. I roughly calibrated the reasonable risk aversion levels based on my own intuition and using a Twitter poll I did a few months ago: https://x.com/Laura_k_Duffy/status/1696180330997141710?s=20. A significant number (about a third of those who are risk averse) of people would only take the bet to save 1000 lives vs. 10 for certain if the chance of saving 1000 was over 5%. I judged this a reasonable cut-off for the moderate risk aversion level. 

4. The reason the hen welfare interventions are much better than the shrimp stunning intervention is that shrimp harvest and slaughter don't last very long. So, the chronic welfare threats that ammonia concentrations battery cages impose on shrimp and hens, respectively, outweigh the shorter-duration welfare threats of harvest and slaughter.

The number of animals for black soldier flies is low, I agree. We are currently using estimates of current populations, and this estimate is probably much lower than population sizes in the future. We're only somewhat confident in the shrimp and hens estimates, and pretty uncertain about the others. Thus, I think one should feel very much at liberty to plug in different numbers for population sizes for animals like black soldier flies.

More broadly, I think this result is likely a limitation of models based on total population size, versus models that are based more on the number of animals affected per campaign. Ideally, as we gather more information about these types of interventions, we could assess the cost-effectiveness using better estimates of the number of animals affected per campaign. 

Thanks for the thorough questions!
 

Hi Sylvester, thanks for sharing that post, I hadn't seen it! 

Hey, thanks for this detailed reply! 
When I said "practical", I more meant "simple things that people can do without needing to download and work directly with the code for the welfare ranges." In this sense, I don't entirely agree that your solution is the most workable of them (assuming independence probably would be). But I agree--pairwise sampling is the best method if you have the access and ability to manipulate the code! (I also think that the perfect correlation you graphed makes the second suggestion probably worse than just assuming perfect independence, so thanks!)

Hi Kyle, 

This is a very interesting post! One quick and very small technical detail: Rethink Priorities' welfare ranges aren't capped at 1 for non-human animals. (It just happens that, when we adjusted for probability of sentience, they all happened to have 50th percentile estimates that fall below 1). They're instead a reflection of the difference between the best and worst states that the non-human animal can experience relative to the difference between the best and worst states that a human can experience (which is normalized to 1). In theory, this relative difference could be greater than 1 if the range in intensity of experiences that a non-human animal can experience is wider than that of humans. 

In fact, one of our welfare range models (the undiluted experiences mode) that feeds into the aggregate estimates tends to produce sentience-adjusted welfare range estimates greater than 1 under the theory that less cognitively complex organisms may not be able to dampen negative experiences by contextualizing them. As such, a few animals have 95th percentile estimates for their welfare ranges that are above 1 (octopuses, pigs, and shrimp). Here are some more details about the models and distributions: https://docs.google.com/document/d/1xUvMKRkEOJQcc6V7VJqcLLGAJ2SsdZno0jTIUb61D8k/edit?usp=sharing As well as the spreadsheet of results from all models: https://docs.google.com/spreadsheets/d/1SpbrcfmBoC50PTxlizF5HzBIq4p-17m3JduYXZCH2Og/edit?usp=sharing 

Again, this is a really thought-provoking and sobering post, thanks for writing it :)

Oh I see! Thanks for the clarification!

Load more