T

tobycrisford 🔸

490 karmaJoined
tobycrisford.github.io/

Comments
108

If you're correct in the linked analysis, this sounds like a really important limitation in ACE's methodology, and I'm very glad you've shared this!

In case anyone else has the same confusion as me when reading your summary: I think there is nothing wrong with calculating a charity's cost effectiveness by taking the weighted sum of the cost-effectiveness of all of their interventions (weighted by share of total funding that intervention receives). This should mathematically be the same as (Total Impact / Total cost), and so should indeed go up if their spending on a particular intervention goes down (while achieving the same impact).

The (claimed) cause of the problem is just that ACE's cost-effectiveness estimate does not go up by anywhere near as much as it should when the cost of an intervention is reduced, leading the cost-effectiveness of the charity as a whole to actually change in the wrong direction when doing the above weighted sum!

If this is true it sounds pretty bad. Would be interested to read a response from them.

Of course, the other thing that could be going on here, is that average cost-effectiveness is not the same as cost-effectiveness on the margin, which is presumably what ACE should care about. Though I don't see why an intervention representing a smaller share of a charity's expenditure should automatically mean that this is not where extra dollars would be allocated. The two things seem independent to me.

I would be very interested to read a summary of what Tyler Cowen means by all this!

I know it was left as an exercise for the reader, but if someone wants to do the work for me it would be appreciated :)

This is a fascinating summary!

I have a bit of a nitpicky question on the use of the phrase 'confidence intervals' throughout the report. Are these really supposed to be interpreted as confidence intervals? Rather than the Bayesian alternative, 'credible intervals'..?

My understanding was that the phrase 'confidence interval' has a very particular and subtle definition, coming from frequentist statistics:

  • 80% Confidence Interval: For any possible value of the unknown parameter, there is an 80% chance that your data-collection and estimation process would produce an interval which contained that value.
  • 80% Credible interval: Given the data you actually have, there is an 80% chance that the unknown parameter is contained in the interval.

From my reading of the estimation procedure, it sounds a lot more like these CIs are supposed to be interpreted as the latter rather than the former? Or is that wrong?

Appreciate this is a bit of a pedantic question, that the same terms can have different definitions in different fields, and that discussions about the definitions of terms aren't the most interesting discussions to have anyway. But the term jumped out at me when reading and so thought I would ask the question!

This is a really interesting post, and I appreciate how clearly it is laid out. Thank you for sharing it! But I'm not sure I agree with it, particularly the way that everything is pinned to the imminent arrival of AGI.

Firstly, the two assumptions you spell out in your introduction, that AGI is likely only a few years away, and that it will most likely come from scaled up and refined versions of moden LLMs, are both much more controversial than you suggest (I think)! (Although I'm not confident they are false either)

But even if we accept those assumptions, the third big assumption here is that we can alter a superintelligent AGI's values in a predictable and straightforward way by just adding in some synethetic training data which expresses the views we like, when building some of its component LLMs. This seems like a strange idea to me!

If we removed some concept from the training data completely, or introduced a new concept that had never appeared otherwise, then I can imagine that having some impact on the AGI's behaviour. But if all kinds of content are included in significant quantities anyway, then i find it hard to get my head around the inclusion of additional carefully chosen synthetic data having this kind of effect. I guess it clashes with my understanding of what a superintelligent AGI means, to think that its behaviour could be altered with such simple manipulation.

I think an important aspect of this is that even if AGI does come from scaling up and refining LLMs, it is not going to just be a LLM in a straightforward definition of that term (i.e. something that communicates by generating each word with a single forward pass through a neural network). At the very least it must also have some sort of hidden internal monologue where it does chain of thought reasoning, and stores memories, etc.

But I don't know much about AI alignment, so would be very interested to read and understand more about the reasoning behind this third assumption.

All that said, even ignoring AGI, LLMs are likely going to be used more and more in people's every day lives over the next few years, so training them to express kinder views towards animal seems like a potentially worthwhile goal anyway. I don't think AGI needs to come into it!

I agree that we can imagine a similar scenario where your identity is changed to a much lesser degree. But I'm still not convinced that we can straightforwardly apply the Platinum rule to such a scenario.

If your subjective wellbeing is increased after taking the pill, then one of the preferences that must be changed is your preference not to take the pill. This means that when we try to apply the Platinum rule: "treat others as they would have us treat them", we are naturally led to ask: "as they would have us treat them when?" If their preference to have taken the pill after taking it is stronger than their preference not to take the pill before taking it, the Platinum rule becomes less straightforward.

I can imagine two ways of clarifying the rule here, to explain why forcing someone to take the pill would be wrong, which you already allude to in your post:

  • We should treat others as they would have us treat them at the time we are making the decision. But this would imply that if someone's preferences are about to naturally, predictably, change for the rest of their life, then we should disregard that when trying to decide what is best for them, and only consider what they want right now. This seems much more controversial than the original statement of the rule.
  • We should treat others as they would have us treat them, considering the preferences they would have over their lifetime if we did not act. But this would imply that if someone was about to eat the pill by accident, thinking it was just a sweet, and we knew it was against their current wishes, then we should not try to stop them or warn them. This would create a very odd action/inaction distinction. Again, this seems much more controversial than the original statement of the rule.

In the post you say the Platinum rule might be the most important thing for a moral theory to get right, and I think I agree with you on this. It is something that seems so natural and obvious that I want to take it as a kind of axiom. But neither of these two extensions to it feel this obvious any more. They both seem very controversial.

I think the rule only properly makes sense when applied to a person-moment, rather than to a whole person throughout their life. If this is true, then I think my original objection still applies. We aren't dealing with a situation where we can apply the platinum rule in isolation. Instead, we have just another utilitarian trade-off between the welfare of one (set of) person(-moments) and another.

This was a really thought-provoking read, thank you!

I think I agree with Richard Chappell's comment that: "the more you manipulate my values, the less the future person is me".

In this particular case, if I take the pill, my preferences, dispositions, and attitudes are being completely transformed in an instant. These are a huge part of what makes me who I am, so I think that after taking this pill I would become a completely different person, in a very literal sense. It would be a new person who had access to all of my memories, but it would not be me.

From this point of view, there is no essential difference between this thought experiment, and the common objection to total utilitarianism where you consider killing one person and replacing them with someone new, so that total well-being is increased.

This is still a troubling thought experiment of course, but I think it does weaken the strength of your appeal to the Platinum rule? We are no longer talking about treating a person differently to how they would want to be treated, in isolation. We just have another utilitarian thought experiment where we are considering harming person X in order to benefit a different person Y.

And I think my response to both thought experiments is the same. Killing a person who does not want to be killed, or changing the preferences of someone who does not want them changed, does a huge amount of harm (at least on a preference-satisfaction version of utilitarianism), so the assumption in these thought experiments that overall preference satisfaction is nevertheless increased is doing a lot of work, more work than it might appear at first.

I really like this thought experiment, thank you for sharing!

Personally, I agree with you, and I think the answer to your headline question is: yes! Your reasoning makes sense to me anyway. (At least if we don't combine the Self-Sampling Assumption with another assumption like the Self-Indication Assumption as well).

I think that your example is essentially equivalent to the Doomsday argument, or the Adam+Eve paradox, see here: https://anthropic-principle.com/preprints/cau/paradoxes But I like that your thought experiment really isolates the key problem and puts precise numbers on it!

I haven't digested the full paper yet, but based on the summary pasted below, this is precisely the claim I was trying to argue for in the "Against Anthropic Shadow" post of mine that you have linked.

It looks like this claim has been fleshed out in a lot more detail here though, and I'm looking forward to reading it properly!

In the post you linked I also went on quite a long digression trying to figure out if it was possible to rescue Anthropic Shadow by appealing to the fact that there might be large numbers of other worlds containing life (this plausibly weakens the strength of evidence provided by A, which may then stop the cancellation in C). I decided it technically was possible, but only if you take a strange approach to anthropic reasoning, with a strange and difficult-to-define observer reference class.

Possibly focusing so much on this digression was a mistake though, since the summary above is really pointing to the important flaw in the original argument!

Load more