AGI ruin mostly rests on strong claims about alignment and deployment, not about society

RobBensinger

Dustin Moskovitz writes on Twitter:

My intuition is that MIRI's argument is almost more about sociology than computer science/security (though there is a relationship). People won't react until it is too late, they won't give up positive rewards to mitigate risk, they won't coordinate, the govt is feckless, etc.
And that's a big part of why it seems overconfident to people, bc sociology is not predictable, or at least isn't believed to be.

And Stefan Schubert writes:

I think it's good @robbensinger wrote a list of reasons he expects AGI ruin. It's well-written.
But it's notable and symptomatic that 9/10 reasons relate to the nature of AI systems and only 1/10 (discussed in less detail) to the societal response.

https://www.lesswrong.com/posts/eaDCgdkbsfGqpWazi/the-basic-reasons-i-expect-agi-ruin
Whatever one thinks the societal response will be, it seems like a key determinant of whether there'll be AGI ruin.
Imo the debate on whether AGI will lead to ruin systematically underemphasises this factor, focusing on technical issues.
It's useful to distinguish between warnings and all-things-considered predictions in this regard.
When issuing warnings, it makes sense to focus on the technology itself. Warnings aim to elicit a societal response, not predict it.
https://www.lesswrong.com/posts/gEShPto3F2aDdT3RY/sleepwalk-bias-self-defeating-predictions-and-existential
But when you actually try to predict what'll happen all-things-considered, you need to take the societal response into account in a big way
As such I think Rob's list is better as a list of reasons we ought to take AGI risk seriously, than as a list of reasons it'll lead to ruin

My reply is:

It's true that in my "top ten reasons I expect AGI ruin" list, only one of the sections is about the social response to AGI risk, and it's a short section. But the section links to some more detailed discussions (and quotes from them in a long footnote):

Also, discussing the adequacy of society's response before I've discussed AGI itself at length doesn't really work, I think, because I need to argue for what kind of response is warranted before I can start arguing that humanity is putting insufficient effort into the problem.

If you think the alignment problem itself is easy, then I can cite all the evidence in the world regarding "very few people are working on alignment" and it won't matter.

If you think a slowdown is unnecessary or counterproductive, then I can point out that governments haven't placed a ceiling on large training runs and you'll just go "So? Why should they?"

Society's response can only be inadequate given some model of what's required for adequacy. That's a lot of why I factor out that discussion into other posts.^[1]

More importantly, contra Dustin, I don't see myself as having strong priors or complicated models regarding the social situation.

Eliezer Yudkowsky similarly says he doesn't have strong predictions about what governments or communities will do in this or that situation (beyond anti-predictions like "they probably won't do specific thing X that's wildly different from anything they've done before"):

[Ngo][12:26]
The other thing is that, for pedagogical purposes, I think it'd be useful for you to express some of your beliefs about how governments will respond to AI
I think I have a rough guess about what those beliefs are, but even if I'm right, not everyone who reads this transcript will be
[Yudkowsky][12:28]
Why would I be expected to know that? I could talk about weak defaults and iterate through an unending list of possibilities.
Thinking that Eliezer thinks he knows that to any degree of specificity feels like I'm being weakmanned!
[Ngo][12:28]
I'm not claiming you have any specific beliefs
[Yudkowsky][12:29]
I suppose I have skepticism when other people dream up elaborately positive and beneficial reactions apparently drawn from some alternate nicer political universe that had an absolutely different response to Covid-19, and so on.
[Ngo][12:29]
But I'd guess that your models rule out, for instance, the US and China deeply cooperating on AI before it's caused any disasters
[Yudkowsky][12:30]
"Deeply"? Sure. That sounds like something that has never happened, and I'm generically skeptical about political things that go better than any political thing has ever gone before.

I don't feel pessimistic about society across all domains, I don't think most tech or scientific progress is at all dangerous or bad, etc. It's mostly just that AGI looks like a super unusual and hard problem to me.

To imagine civilization behaving really unusually and doing something a lot harder than it's ever done, I need strong predictive models saying why civilization will do those things. Adequate strategies are conjunctive; I don't need special knowledge to predict "not that".

It's true that this requires a bare minimum model of civilization saying that we aren't a sane, coordinated super-agent that just handles problems whenever there's something important to do.

If humanity did consistently strategically scale its efforts with the difficulty and importance of problems in the world (even when weird and abstract analysis is required to see how hard and important the problem is), then I would expect us to just flexibly scale up our efforts and modify all our old heuristics in response to the alignment problem.^[2]

So I'm at least making the anti-prediction "civilization isn't specifically like that".

Example: I don't in fact see my high p(doom) as resting on a strong assumption about whether people will panic and ban a bunch of AI things. My high level of concern is predicated on a reasonable amount of uncertainty about whether that will happen.

The issue is that "people panic and ban things", while potentially helpful on the margin, does not consistently save the world and cause the long-term future to go well (and there's a nontrivial number of worlds where it makes things worse on net). The same issue of aligning and wielding powerful tech has to be addressed anyway.

Maybe panic buys us another 5 years, optimistically; maybe it even buys us 20, amazingly. But if superintelligence comes in 2055 rather than 2035, I still very much expect catastrophe. So possibilities like this don't strongly shift the set of worlds I expect to see toward optimistic outcomes.

Stefan replies on Twitter:

Thanks, Rob, this is helpful.
I do actually think you should put the kinds of arguments you give here [...] in posts like this, since "people will rise to the occasion" seems like one of the key counter-argument to your views; so it seems central to rebut that.
I also think there's some tension between being uncertain about what the societal response will be and being relatively certain of doom. (Though it depends on the levels of un/certainty.)
I think many would give the simple argument:
P1: Whether there'll be AI doom depends on the societal response
P2: It's uncertain what the societal response will be
C: It's uncertain whether there'll be AI doom (so P(doom) isn't very high)
Could be good to address that head on

There's of course tension! Indeed, I'd phrase it more strongly than that: uncertainty about the societal response is one of the largest reasons I still have any hope for the future. It's one of the main factors pushing against high p(doom), on my model.

"We don't know exactly how hard alignment is, and in the end it's just a technical problem" is plausibly an even larger factor. It's easier to get clear data about humanity's coordination ability than to get clear data about how hard alignment is: we have huge amounts of direct observational data about how humans and nations tend to behave, whereas no amount of failed work can rule out the possibility that someone will come up with a brilliant new alignment approach tomorrow that just works.

That said, there are enough visible obstacles to alignment, and enough failed attempts have been made at this point, that I'm willing to strongly bet against a miracle solution occurring (while working to try to prove myself wrong about this).

"Maybe society will coordinate to do something miraculous" and "maybe we'll find a miraculously effective alignment solution" are possibilities that push in the direction of hope, but they don't strike me as likely in absolute terms.

The reason "maybe society will do something miraculous" seems unlikely to me is mostly just because the scale of the required miracle seems very large to me.

This is because:

I think it's very likely that we'll need to solve both the alignment problem and the deployment problem in order to see good outcomes.
It seems to me that these two problems both require getting a large number of things right, and some of these things seem very hard, and/or seem to require us to approach the problem is very novel and unusual ways.

AGI Ruin and Capabilities Generalization, and the Sharp Left Turn make the case for the alignment problem seeming difficult and/or out-of-scope for business-as-usual machine learning.

"Pivotal acts seem hard" and "there isn't a business-as-usual way to prevent AGI tech from proliferating and killing everyone" illustrate why the deployment problem seems difficult and/or demanding of very novel strategies, and Six Dimensions of Operational Adequacy in AGI Projects fills in a lot of the weird-or-hard details.

When we're making a large enough ask of civilization (in terms of raw difficulty, and/or in terms of requiring civilization to go wildly off-script and do things in very different ways than it has in the past), we can have a fair amount of confidence that civilization won't fulfill the ask even if we're highly uncertain about the specific dynamics at work, the specific course history will take, etc.

^{^}
It's also not clear to me what Stefan (or Dustin) would want me to actually say about society, in summarizing my views.
In the abstract, it's fine to say "society is very important, so it's weird if only 1/10 of the items discuss society". But I don't want to try to give equal time to technical and social issues just for the sake of emphasizing the importance of social factors. If I'm going to add more sentences to a post, I want it to be because the specific claims I'm adding are important, unintuitive, etc. What are the crucial specifics that are missing?
^{^}
Though if we actually lived in that world, we would have already made that observation. A sane world that nimbly adapts its policies in response to large and unusual challenges doesn't wait until the last possible minute to snatch victory from the jaws of defeat; it gets to work on the problem too early, tries to leave itself plenty of buffer, etc.

16 Reactions

More posts like this

Comments4

Sorted by

New & upvoted

Click to highlight new comments since: Today at 4:46 AM

MichaelPlantApr 24 20235

To chime in, I think it would be helpful to distinguish between:

1. AI risks on a 'business as usual' model, where society continues as it was before, ie not doing much

and

2. AI risks given different levels of society response.

This would then be analogous to familiar discussions about climate change, where people talk about different CO2 rise scenarios, how bad each would be, and also how much effort is required to achieve different levels of reduced emissions. I recognise it's not very easy to specify options for 2, but it seems worth a try. To decide how much effort to put in, we need to understand the risk in 1 and how much it can go down for versions of 2, and the costs involved.

To elaborate, someone could say

(A) we're almost certainly screwed, whatever we do

(B) we might be screwed, but not if we get our act together, which we're not doing now

(D) there's nothing to worry about in the first place.

Obviously, these aren't the only options. (A), (C), and (D) imply that few or no additional resources are useful, whereas (B) implies extra resources are worthwhile. My impression is Yudowsky's line is (A).

RobBensingerApr 24 20239

To chime in, I think it would be helpful to distinguish between:
1. AI risks on a 'business as usual' model, where society continues as it was before, ie not doing much
and
2. AI risks given different levels of society response.

I like this! Richard Ngo and Eliezer discuss this a bit in Ngo's view on alignment difficulty:

[Ngo] (Sep. 25 [2021] Google Doc)

Perhaps the best way to pin down disagreements in our expectations about the effects of the strategic landscape is to identify some measures that could help to reduce AGI risk, and ask how seriously key decision-makers would need to take AGI risk for each measure to be plausible, and how powerful and competent they would need to be for that measure to make a significant difference. Actually, let’s lump these metrics together into a measure of “amount of competent power applied”. Some benchmarks, roughly in order (and focusing on the effort applied by the US):

Banning chemical/biological weapons
COVID
- Key points: mRNA vaccines, lockdowns, mask mandates
Nuclear non-proliferation
- Key points: Nunn-Lugar Act, stuxnet, various treaties
The International Space Station
- Cost to US: ~$75 billion
Climate change
- US expenditure: >$154 billion (but not very effectively)
Project Apollo
- Wikipedia says that Project Apollo “was the largest commitment of resources ($156 billion in 2019 US dollars) ever made by any nation in peacetime. At its peak, the Apollo program employed 400,000 people and required the support of over 20,000 industrial firms and universities.”
WW1
WW2

[Yudkowsky][12:02] (Sep. 25 [2021] comment)

WW2

This level of effort starts to buy significant amounts of time. This level will not be reached, nor approached, before the world ends.

See the post for more discussion, including an update from Eliezer: "I've updated somewhat off of Carl Shulman's argument that there's only one chip supply chain which goes through eg a single manufacturer of lithography machines (ASML), which could maybe make a lock on AI chips possible with only WW1 levels of cooperation instead of WW2."

Eliezer's Pausing AI Developments Isn't Enough. We Need to Shut it All Down is also trying to do something similar, as is his (written-in-2017) post Six Dimensions of Operational Adequacy in AGI Projects.

I interpret "Pausing AI Developments Isn't Enough" as saying "if governments did X, then we'd still probably be in enormous amounts of danger, but there would now be a non-tiny probability of things going well". (Maybe even a double-digit probability of things going well for humanity.)

Eliezer doesn't think governments are likely to do X, but he thinks we should make a desperate effort to somehow pull off getting governments to do X anyway on EV grounds: there aren't any markedly-more-hopeful alternatives, and we're all dead if we fail.

(Though there may be some other similarly-hopeless-but-worth-trying-anyway options, like moonshot attempts to solve the alignment problem, or a Manhattan Project to build nanotechnology, or what-have-you. My Eliezer-model wants highly competent and sane people pursuing all of these unlikely-to-work ideas in parallel, because then it's more likely that at least one succeeds.)

Six Dimensions of Operational Adequacy in AGI Projects divides amounts of effort into "token", "improving", "adequate", "excellent", and "unrealistic", but it doesn't say how high the risk level is under different buckets. I think this is mostly because Eliezer's model gives a macroscopic probability to success if an AGI project is "adequate" on all six dimensions at once, and a tiny probability to success if it falls short of adequacy on any dimension.

My Eliezer-model thinks that "token" and "improving" both mean you're dead, and he doesn't necessarily think he can give meaningful calibrated confidences that distinguish degrees of deadness when the situation looks that bad.

(A) we're almost certainly screwed, whatever we do
(B) we might be screwed, but not if we get our act together, which we're not doing now
(C) we might be screwed, but not if we get our act together, which I'm confident will happen anyway
(D) there's nothing to worry about in the first place.
Obviously, these aren't the only options. (A), (C), and (D) imply that few or no additional resources are useful, whereas (B) implies extra resources are worthwhile. My impression is Yudowsky's line is (A).

Seems like a wrong framing to me. My model (and Eliezer's) is that A and B are both right: We're almost certainly screwed, whatever we do; but not if humanity gets its act together in a massive way (which we're currently not doing, but should try to do because otherwise we're dead).

"No additional resources are useful" makes it sound like Eliezer is advocating for humanity to give up, which he obviously isn't doing. Rather, my view and Eliezer's is that we should try to save the world (because the alternative is ruin), even though some things will have to go miraculously right in order for our efforts to succeed.

RobBensingerApr 24 20234

Dustin Moskovitz comments on Twitter:

The deployment problem is part of societal response to me, not separate.
[...] Eg race dynamics, regulation (including ability to cooperate with competitors), societal pressure on leaders, investment in watchdogs (human and machine), safety testing norms, whether things get open sourced, infohazards.

"The deployment problem is hard and weird" comes from a mix of claims about AI (AGI is extremely dangerous, you don't need a planet-sized computer to run it, software and hardware can and will improve and proliferate by default, etc.) and about society ("if you give a decent number of people the ability to wield dangerous AGI tech, at least one or them will choose to use it").

The social claims matter — two people who disagree about how readily Larry Page and/or Mark Zuckerberg would put the world at risk might as a result disagree about whether a Good AGI Project has median 8 months vs. 12 months to do a pivotal act.

When I say "AGI ruin rests on strong claims about the alignment problem and deployment problem, not about society", I mean that the claims you need to make about society in order to think the alignment and deployment problems are that hard and weird, are weak claims (e.g. "if fifty random large AI companies had the ability to use dangerous AGI, at least one would use it"), and that the other claims about society required for high p(doom) are weak too (e.g. "humanity isn't a super-agent that consistently scales up its rationality and effort in proportion to a problem's importance, difficulty, and weirdness").

Arguably the difficulty of the alignment problem itself also depends in part on claims about society. E.g., the difficulty of alignment depends on the difficulty of the task we're aligning, which depends on "what sort of task is needed to end the acute x-risk period?", which depends again on things like "will random humans destroy the world if you hand them world-destroying AGI?".

The thing I was trying to communicate (probably poorly) isn't "Alignment, Deployment, and Society partitions the space of topics", but rather:

High p(doom) rests on strong claims about AI/compute/etc. and quite weak claims about humanity/society.
The most relevant claims (~all the strong ones, and an important subset of the weak ones) are mostly claims about the difficulty, novelty, and weirdness of the alignment and deployment problems.

RobBensingerApr 24 20232

Note that if it were costless to make the title way longer, I'd change this post's title from "AGI ruin mostly rests on strong claims about alignment and deployment, not about society" to the clearer:

The AGI ruin argument mostly rests on claims that the alignment and deployment problems are difficult and/or weird and novel, not on strong claims about society