A question jumped out at me when reading these results. I should caveat this by emphasizing that I am very much not an expert in this kind of evaluation and this question may be naive.
Is there any seasonal effect on mortality in Malawi? If so, is it ok for the pre-intervention period to be 12-months while the post-intervention period is 18-months?
If you're correct in the linked analysis, this sounds like a really important limitation in ACE's methodology, and I'm very glad you've shared this!
In case anyone else has the same confusion as me when reading your summary: I think there is nothing wrong with calculating a charity's cost effectiveness by taking the weighted sum of the cost-effectiveness of all of their interventions (weighted by share of total funding that intervention receives). This should mathematically be the same as (Total Impact / Total cost), and so should indeed go up if their spending on a particular intervention goes down (while achieving the same impact).
The (claimed) cause of the problem is just that ACE's cost-effectiveness estimate does not go up by anywhere near as much as it should when the cost of an intervention is reduced, leading the cost-effectiveness of the charity as a whole to actually change in the wrong direction when doing the above weighted sum!
If this is true it sounds pretty bad. Would be interested to read a response from them.
Of course, the other thing that could be going on here, is that average cost-effectiveness is not the same as cost-effectiveness on the margin, which is presumably what ACE should care about. Though I don't see why an intervention representing a smaller share of a charity's expenditure should automatically mean that this is not where extra dollars would be allocated. The two things seem independent to me.
This is a fascinating summary!
I have a bit of a nitpicky question on the use of the phrase 'confidence intervals' throughout the report. Are these really supposed to be interpreted as confidence intervals? Rather than the Bayesian alternative, 'credible intervals'..?
My understanding was that the phrase 'confidence interval' has a very particular and subtle definition, coming from frequentist statistics:
From my reading of the estimation procedure, it sounds a lot more like these CIs are supposed to be interpreted as the latter rather than the former? Or is that wrong?
Appreciate this is a bit of a pedantic question, that the same terms can have different definitions in different fields, and that discussions about the definitions of terms aren't the most interesting discussions to have anyway. But the term jumped out at me when reading and so thought I would ask the question!
This is a really interesting post, and I appreciate how clearly it is laid out. Thank you for sharing it! But I'm not sure I agree with it, particularly the way that everything is pinned to the imminent arrival of AGI.
Firstly, the two assumptions you spell out in your introduction, that AGI is likely only a few years away, and that it will most likely come from scaled up and refined versions of moden LLMs, are both much more controversial than you suggest (I think)! (Although I'm not confident they are false either)
But even if we accept those assumptions, the third big assumption here is that we can alter a superintelligent AGI's values in a predictable and straightforward way by just adding in some synethetic training data which expresses the views we like, when building some of its component LLMs. This seems like a strange idea to me!
If we removed some concept from the training data completely, or introduced a new concept that had never appeared otherwise, then I can imagine that having some impact on the AGI's behaviour. But if all kinds of content are included in significant quantities anyway, then i find it hard to get my head around the inclusion of additional carefully chosen synthetic data having this kind of effect. I guess it clashes with my understanding of what a superintelligent AGI means, to think that its behaviour could be altered with such simple manipulation.
I think an important aspect of this is that even if AGI does come from scaling up and refining LLMs, it is not going to just be a LLM in a straightforward definition of that term (i.e. something that communicates by generating each word with a single forward pass through a neural network). At the very least it must also have some sort of hidden internal monologue where it does chain of thought reasoning, and stores memories, etc.
But I don't know much about AI alignment, so would be very interested to read and understand more about the reasoning behind this third assumption.
All that said, even ignoring AGI, LLMs are likely going to be used more and more in people's every day lives over the next few years, so training them to express kinder views towards animal seems like a potentially worthwhile goal anyway. I don't think AGI needs to come into it!
I agree that we can imagine a similar scenario where your identity is changed to a much lesser degree. But I'm still not convinced that we can straightforwardly apply the Platinum rule to such a scenario.
If your subjective wellbeing is increased after taking the pill, then one of the preferences that must be changed is your preference not to take the pill. This means that when we try to apply the Platinum rule: "treat others as they would have us treat them", we are naturally led to ask: "as they would have us treat them when?" If their preference to have taken the pill after taking it is stronger than their preference not to take the pill before taking it, the Platinum rule becomes less straightforward.
I can imagine two ways of clarifying the rule here, to explain why forcing someone to take the pill would be wrong, which you already allude to in your post:
In the post you say the Platinum rule might be the most important thing for a moral theory to get right, and I think I agree with you on this. It is something that seems so natural and obvious that I want to take it as a kind of axiom. But neither of these two extensions to it feel this obvious any more. They both seem very controversial.
I think the rule only properly makes sense when applied to a person-moment, rather than to a whole person throughout their life. If this is true, then I think my original objection still applies. We aren't dealing with a situation where we can apply the platinum rule in isolation. Instead, we have just another utilitarian trade-off between the welfare of one (set of) person(-moments) and another.
This was a really thought-provoking read, thank you!
I think I agree with Richard Chappell's comment that: "the more you manipulate my values, the less the future person is me".
In this particular case, if I take the pill, my preferences, dispositions, and attitudes are being completely transformed in an instant. These are a huge part of what makes me who I am, so I think that after taking this pill I would become a completely different person, in a very literal sense. It would be a new person who had access to all of my memories, but it would not be me.
From this point of view, there is no essential difference between this thought experiment, and the common objection to total utilitarianism where you consider killing one person and replacing them with someone new, so that total well-being is increased.
This is still a troubling thought experiment of course, but I think it does weaken the strength of your appeal to the Platinum rule? We are no longer talking about treating a person differently to how they would want to be treated, in isolation. We just have another utilitarian thought experiment where we are considering harming person X in order to benefit a different person Y.
And I think my response to both thought experiments is the same. Killing a person who does not want to be killed, or changing the preferences of someone who does not want them changed, does a huge amount of harm (at least on a preference-satisfaction version of utilitarianism), so the assumption in these thought experiments that overall preference satisfaction is nevertheless increased is doing a lot of work, more work than it might appear at first.
I think this is a fascinating area, and the problems you've highlighted seem like important problems. I find it hard to believe it's a cause area EAs should focus on though.
As you explain, the clearest threat is the impact on cryptography, but it doesn't seem likely to me that that problem is neglected. There are huge incentives for governments and companies to solve that problem, and I think they are probably already doing lots of work on it..?