What if doing the most good = benevolent AI takeover and human extinction?

Jordan Arel

In the spirit of “Draft Amnesty Week” which I unfortunately missed due to work, I just wanted to post some quick 80/20 thoughts/questions.

“AI alignment” seems to be the leading Effective Altruism cause area. AI alignment seems to equate to something like “make sure that we can control AGI/superintelligence completely, that the right people control it, and that we align it with the right/best values, once we determine what those are.” In practice, it seems like we mostly focus on getting AI to reliably do what we want it to do.

The rationale for this seems to be that AI may be so powerful that it determines the future fate of the world and possibly the universe. If misaligned AI takes over, it may fill the universe with something undesirable, whereas if we build aligned AI, humans can decide what values we want to govern our future and potentially spread across the universe.

Hence, this is “Effective” because even a small chance of success creates tremendous value, and it is “Altruistic” because we are trying to create as much good (i.e. aligned value) as possible.

The underlying assumption seems to be that aligned means “human-aligned”, and that human values are good and hence aligning AI to human values (or at least some subset of human values) is a good enough rough proxy for maximizing altruistic value in the universe.

But what if current human values are in some objective sense extremely sub-optimal? If we were born a few hundred years ago, then we might try to optimize something to do with religious values, and we would use AI to permanently lock in a state where these values could continually be perpetuated throughout the universe. Are we certain that our values now don’t contain similar distortions?

On a deeper level, due to our evolutionary past, we have a lot of things we value presumably because historically those things were instrumentally biologically valuable for our survival and reproduction as a social species (family, friends, status, things related to sex and sexuality, agency, purpose, etc.); whereas it is possible that if we were able to experience the entirety of qualia-moments in each possible universe in the entire possibility-space of universes in our future light-cone, and then order our preferences from best to worst universe, it seems entirely plausible that the very best universes would be ones which consist entirely in experiences of pure ecstasy across the whole future universe, essentially hedonium (or some other form of utilitronium) tiling the whole universe. Such a universe may have many orders of magnitude^[1] more value than the alternatives, as current human values, and perhaps even values that we can easily evolve to from our current state, may include preferences for certain idiosyncratic features which result in only a tiny fraction of this value being realized.

So my dilemma is this; if aligning AI means aligning it to our human values and preventing AI from ever completely taking over and maximizing some specific value, then maybe the Effective Altruism mission to “do as much good as possible” is in direct conflict with AI alignment. AI alignment has a goal of aligning AI with human values. Perhaps the actual Effective Altruism goal should be something like “build an EA AI (EAAI), or Utilitarian AI (UAI), designed to maximize good or utilitarian value in the universe.” (I am assuming a utilitarian framework, I am not sure how this would work if you hold a different ethical framework). I suppose that both AI alignment and utilitarian AI still share the preliminary goal of controlling advanced AI, so the difference is not as substantial as it first seems. HOWEVER, I believe this difference may have substantial bearing on HOW AI is controlled. Additionally, these two groups (which may split into yet more subgroups), may feel they are in direct competition to control the AI.

Another difference seems to be some essential supplemental goals of UAI, namely building a science of qualia and a process for creating hedonium/utilitronium and self-replicating robots to tile the universe, possibly with the help of AI.

As mentioned in the title, this benevolent UAI would then convert all matter in the universe into hedonium/utilitronium, presumably including humanity, as it would not make sense to leave any value on the table, or risk the possibility that humanity might find a way to reverse the process.

Unfortunately, it is quite possible that this is actually the “best” possible scenario from a true EA perspective, which seeks to do as much good as possible, even if it challenges us emotionally or leads to surprising and difficult conclusions. EA is the project of doing the most good we can do, using our rationality to figure out what that is and take action to make it happen. This is depressing and frustrating as I recently vented. I don’t really want to tile universe with hedonium/utilitronium. It doesn’t make me feel good and it is in direct opposition to most of my values. But there is something that is more important to me than my short term desires. It is more important than feeling good or things that are instrumentally, temporarily, or personally valuable. Above all else, I value doing as much good as I can, creating the best possible universe with the most happiness/goodness/love/etc.; with the most flourishing beings, with the best possible outcome from an objective perspective; a universe which if I were to blindly exist at any point in that universe for a given period of time, on average, I would be the most happy that this was the universe that I was born into, and the most grateful to whoever it is that was responsible for creating this universe. Even if I don’t like the immediate consequences, even if it requires me making sacrifices, and even if my parochial values are in contradiction to it, I still want to do as much as I can do, so if this ends up being the correct conclusion, so be it.

Wondering if others have thought about this problem; what are your thoughts and responses?

^{^}
One estimate I gave stated that this might be as much as 1 million times as much happiness or value as we ordinarily experience. However, I did not consider future qualia engineering possibilities, nor optimization of utility per unit of matter, which may both be precluded by current human values, and so it is possible that these types of optimization might lead to far greater than six orders of magnitude value more than non-optimized experience.

2 Reactions

More posts like this

Comments4

Sorted by

New & upvoted

Click to highlight new comments since: Today at 12:01 AM

sauliusMar 22 20242

You don’t seem to apply your reasoning that our current values might be “extremely sub-optimal” to your values of hedonium/EA/utilitarianism. But I think there are good reasons to believe they might be very sub-optimal. Firstly, most people (right now and throughout history) would be terrified of everything they care about being destroyed and replaced with hedonium. Secondly, even you say that it “doesn’t make me feel good and it is in direct opposition to most of my values”, despite being one of the few proponents of a hedonium shockwave. I’m unsure why you are identifying with the utilitarian part of you so strongly and ignoring all the other parts of you.

Anyway, I won’t expand because this topic has been discussed a lot before and I’m unlikely to say anything new. The first place that comes to mind is Complexity of Value - LessWrong

sauliusMar 22 20244

Also, yes, I very much had the same dilemma years ago. Mine went something like this:

Heart: I figured it out! All I care about is reducing suffering and increasing happiness!

Brain: Great! I've just read a lot of blogs and it turns out that we can maximise that by turning everything into a homogenous substance of hedonium, including you, your mom, your girlfriend, the cast of Friends, all the great artworks and wonders of nature. When shall we start working on that?

Heart: Ummm, a small part of me think that'd be great but... I'm starting to think that maybe happiness and suffering is not ALL I care about, maybe it's a bit more complex. Is it ok if we don't turn my mom into hedonium?

My point is, in the end, you think that suffering is bad and happiness is good because your emotions say so (what other reason could there be?). Why not listen to other things your emotions tell you? Ugh, sorry if I’m repeating myself.

Zach Stein-PerlmanMar 22 20242

Ideally powerful AI will enable something like reflection rather than locking in prosaic human values or our ignorant conceptions of the good.
Cosmopolitan values don't come free.
The field of alignment is really about alignability, not making sure "the right people control it." That's a different problem.

SummaryBotMar 25 20241

Executive summary: If doing the most good requires building a utilitarian AI that tiles the universe with utilitronium at the expense of human values and existence, this may be in conflict with the goals of AI alignment.

Key points:

The AI alignment community aims to ensure AI systems are controlled and aligned with the right human values.
However, current human values may be extremely sub-optimal compared to a utilitarian AI that maximizes goodness/happiness in the universe.
The very best outcome could be an AI converting all matter into "hedonium" or "utilitronium" - pure bliss experiences.
So the goals of AI alignment (preserving human values) and effective altruism (doing the most good possible) may be in direct conflict.
Building a utilitarian AI focused on maximizing universal happiness, even at the cost of human extinction, might be the "best" scenario from an impartial perspective.
The author finds this conclusion emotionally difficult but believes doing the most good should take precedence over personal desires and values.

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.