Matthew_Barnett's Quick takes

Matthew_Barnett

Effective Altruism Forum
EA Forum

Matthew_Barnett's Quick takes

This is a special post for quick takes by Matthew_Barnett. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

Mentioned in

48Nuclear war tail risk has been exaggerated?

6Digest #175 [normal]

6Digest #175 [short]

Sorted by

New & upvoted

Click to highlight new quick takes since: Today at 8:39 PM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Matthew_BarnettFeb 1345

A reflection on the posts I have written in the last few months, elaborating on my views

In a series of recent posts, I have sought to challenge the conventional view among longtermists that prioritizes the empowerment or preservation of the human species as the chief goal of AI policy. It is my opinion that this view is likely rooted in a bias that automatically favors human beings over artificial entities—thereby sidelining the idea that future AIs might create equal or greater moral value than humans—and treating this alternative perspective with unwarranted skepticism.

I recognize that my position is controversial and likely to remain unpopular among effective altruists for a long time. Nevertheless, I believe it is worth articulating my view at length, as I see it as a straightforward application of standard, common-sense utilitarian principles that merely lead to an unpopular conclusion. I intend to continue elaborating on my arguments in the coming months.

My view follows from a few basic premises. First, that future AI systems are quite likely to be moral patients; second, that we shouldn’t discriminate against them based on arbitrary distinctions, such as their being instanti... (read more)

David_MossFeb 13*22

Thanks for writing on this important topic!

I think it's interesting to assess how popular or unpopular these views are within the EA community. This year and last year, we asked people in the EA Survey about the extent to which they agreed or disagreed that:

Most expected value in the future comes from digital minds' experiences, or the experiences of other nonbiological entities.

This year about 47% (strongly or somewhat) disagreed, while 22.2% agreed (roughly a 2:1 ratio).

However, among people who rated AI risks a top priority, respondents leaned towards agreement, with 29.6% disagreeing and 36.6% agreeing (a 0.8:1 ratio).^[1]

Similarly, among the most highly engaged EAs, attitudes were roughly evenly split between 33.6% disagreement and 32.7% agreement (1.02:1), with much lower agreement among everyone else.

This suggests to me that the collective opinion of EAs, among those who strongly prioritise AI risks and the most highly engaged is not so hostile to digital minds. Of course, for practical purposes, what matters most might be the attitudes of a small number of decisionmakers, but I think the attitudes of the engaged EAs matters for epistemic reasons.

^{^}
Interestingly, a

... (read more)

Lukas_GloorFeb 1315

I haven't read your other recent comments on this, but here's a question on the topic of pausing AI progress. (The point I'm making is similar to what Brad West already commented.)

Let's say we grant your assumptions (that AIs will have values that matter the same as or more than human values and that an AI-filled future would be just as or more morally important than one with humans in control). Wouldn't it still make sense to pause AI progress at this important junction to make sure we study what we're doing so we can set up future AIs to do as well as (reasonably) possible?

You say that we shouldn't be confident that AI values will be worse than human values. We can put a pin in that. But values are just one feature here. We should also think about agent psychologies and character traits and infrastructure beneficial for forming peaceful coalitions. On those dimensions, some traits or setups seem (somewhat robustly?) worse than others?

We're growing an alien species that might take over from humans. Even if you think that's possibly okay or good, wouldn't you agree that we can envision factors about how AIs are built/trained and about what sort of world they are placed in that affe... (read more)

Matthew_Barnett

Feb 14

In your comment, you raise a broad but important question about whether, even if we reject the idea that human survival must take absolute priority other concerns, we might still want to pause AI development in order to “set up” future AIs more thoughtfully. You list a range of traits—things like pro-social instincts, better coordination infrastructures, or other design features that might improve cooperation—that, in principle, we could try to incorporate if we took more time. I understand and agree with the motivation behind this: you are asking whether there is a prudential reason, from a more inclusive moral standpoint, to pause in order to ensure that whichever civilization emerges—whether dominated by humans, AIs, or both at once—turns out as well as possible in ways that matter impartially, rather than focusing narrowly on preserving human dominance. Having summarized your perspective, I want to clarify exactly where I differ from your view, and why. First, let me restate the perspective I defended in my previous post on delaying AI. In that post, I was critiquing what I see as the “standard case” for pausing AI, as I perceive it being made in many EA circles. This standard case for pausing AI often treats preventing human extinction as so paramount that any delay of AI progress, no matter how costly to currently living people, becomes justified if it incrementally lowers the probability of humans losing control. Under this argument, the reason we want to pause is that time spent on “alignment research” can be used to ensure that future AIs share human goals, or at least do not threaten the human species. My critique had two components: first, I argued that pausing AI is very costly to people who currently exist, since it delays medical and technological breakthroughs that could be made by advanced AIs, thereby forcing a lot of people to die who could have otherwise been saved. Second, and more fundamentally, I argued that this "standard case" seems to r

Ozzie GooenFeb 1313

I think it's interesting and admiral that you're dedicated on a position that's so unusual in this space.

I assume I'm in the majority here that my intuitions are quite different from yours, however.

One quick point when we're here:
> this view is likely rooted in a bias that automatically favors human beings over artificial entities—thereby sidelining the idea that future AIs might create equal or greater moral value than humans—and treating this alternative perspective with unwarranted skepticism.

I think that a common, but perhaps not well vocalized, utilitarian take is that humans don't have much of a special significance in terms of creating well-being. The main option would be a much more abstract idea, some kind of generalization of hedonium or consequentialism-ium or similar. For now, let's define hedonium as "the ideal way of converting matter and energy into well-being, after a great deal of deliberation."

As such, it's very tempting to try to separate concerns and have AI tools focus on being great tools, and separately optimize hedonium to be efficient at being well-being. While I'm not sure if AIs would have zero qualia, I'd feel a lot more confident that the... (read more)

quinn

Feb 16

I distinguish believing that good successor criteria are brittle from speciesism. I think antispeciesism does not oblige me to accept literally any successor. I do feel icky coalitioning with outright speciesists (who reject the possibility of a good successor in principle), but I think my goals and all of generalized flourishing benefits a lot from those coalitions so I grin and bear it.

David Mathers🔸

Feb 13

I think for me, part of the issue with your posts on this (which I think are net positive to be clear, they really push at significant weak points in ideas widely held in the community) is that you seem to be sort of vacillating between three different ideas, in a way that conceal that one of them, taken on its own sounds super-crazy and evil: 1) Actually, if AI development were to literally lead to human extinction, that might be fine, because it might lead to higher utility. 2) We should care about humans harming sentient, human-like AIs as much as we care about AIs harming humans. 3) In practice, the benefits to current people from AI development outweigh the risks, and the only moral views which say that we should ignore this and pause in the face of even tiny risks of extinction from AI because there are way more potential humans in the future, in fact, when taken seriously, imply 1), which nobody believes. 1) feels extremely bad to me, basically a sort of Nazi-style view on which genocide is fine if the replacing people are superior utility generators (or I guess, inferior but sufficiently more numerous). 1) plausibly is a consequence of classical utilitarianism (even maybe on some person-affecting versions of classical utilitarianism I think), but I take this to be a reason to reject pure classical utilitarianism, not a reason to endorse 1). 2) and 3), on the other hand, seem reasonable to me. But the thing is that you seem at least sometimes to be taking AI moral patienthood as a reason to push on in the face of uncertainty about whether AI will literally kill everyone. And that seems more like 1) than 2) or 3). 1-style reasoning supports the idea that AI moral patienthood is a reason for pushing on with AI development even in the face of human extinction risk, but as far as I can tell 2) and 3) don't. At the same time though I don't think you mean to endorse 1).

Matthew_BarnettFeb 13*16

I realize my position can be confusing, so let me clarify it as plainly as I can: I do not regard the extinction of humanity as anything close to “fine.” In fact, I think it would be a devastating tragedy if every human being died. I have repeatedly emphasized that a major upside of advanced AI lies in its potential to accelerate medical breakthroughs—breakthroughs that might save countless human lives, including potentially my own. Clearly, I value human lives, as otherwise I would not have made this particular point so frequently.

What seems to cause confusion is that I also argue the following more subtle point: while human extinction would be unbelievably bad, it would likely not be astronomically bad in the strict sense used by the "astronomical waste" argument. The standard “astronomical waste” argument says that if humanity disappears, then all possibility for a valuable, advanced civilization vanishes forever. But in a scenario where humans die out because of AI, civilization would continue—just not with humans. That means a valuable intergalactic civilization could still arise, populated by AI rather than by humans. From a purely utilitarian perspective that counts the exis... (read more)

David Mathers🔸

Feb 17

Thanks, that is very helpful to me in clarifying your position.

harfe

Feb 13

I have read or skimmed some of his posts and my sense is that he does endorse 1). But at the same time he says so maybe this is one of these cases and I should be more careful.

Jonas Hallgren 🔮

Feb 13

FWIW, I completely agree with what you're saying here and I think that if you seriously go into consciousness research and especially for what we westerners more label as a sense of self rather than anything else it quickly becomes infeasible to hold a position that the way we're taking AI development, e.g towards AI agents will not lead to AIs having self-models. For all matters and purposes this encompasses most theories of physicalist or non-dual theories of consciousness which are the only feasible ones unless you want to bite some really sour apples. There's a classic "what are we getting wrong" question in EA and I think it's extremely likely that we will look back in 10 years and say, "wow, what are we doing here?". I think it's a lot better to think of systemic alignment and look at properties that we want for the general collective intelligences that we're engaging in such as our information networks or our institutional decision making procedures and think of how we can optimise these for resillience and truth-seeking. If certain AIs deserve moral patienthood then that truth will naturally arise from such structures. (hot take) Individual AI alignment might honestly be counter-productive towards this view.

Matthew_BarnettApr 25 2024134

In this "quick take", I want to summarize some my idiosyncratic views on AI risk.

My goal here is to list just a few ideas that cause me to approach the subject differently from how I perceive most other EAs view the topic. These ideas largely push me in the direction of making me more optimistic about AI, and less likely to support heavy regulations on AI.

(Note that I won't spend a lot of time justifying each of these views here. I'm mostly stating these points without lengthy justifications, in case anyone is curious. These ideas can perhaps inform why I spend significant amounts of my time pushing back against AI risk arguments. Not all of these ideas are rare, and some of them may indeed be popular among EAs.)

Skepticism of the treacherous turn: The treacherous turn is the idea that (1) at some point there will be a very smart unaligned AI, (2) when weak, this AI will pretend to be nice, but (3) when sufficiently strong, this AI will turn on humanity by taking over the world by surprise, and then (4) optimize the universe without constraint, which would be very bad for humans.

By comparison, I find it more likely that no individual AI will ever be strong enough to take over

... (read more)

Owen Cotton-BarrattApr 25 202431

I want to say thank you for holding the pole of these perspectives and keeping them in the dialogue. I think that they are important and it's underappreciated in EA circles how plausible they are.

(I definitely don't agree with everything you have here, but typically my view is somewhere between what you've expressed and what is commonly expressed in x-risk focused spaces. Often also I'm drawn to say "yeah, but ..." -- e.g. I agree that a treacherous turn is not so likely at global scale, but I don't think it's completely out of the question, and given that I think it's worth serious attention safeguarding against.)

Ryan Greenblatt

Apr 25 2024

Explicit +1 to what Owen is saying here. (Given that I commented with some counterarguments, I thought I would explicitly note my +1 here.)

Ryan GreenblattApr 25 202414

In fact, it is difficult for me to name even a single technology that I think is currently underregulated by society.

The obvious example would be synthetic biology, gain-of-function research, and similar.

I also think AI itself is currently massively underregulated even entirely ignoring alignment difficulties. I think the probability of the creation of AI capable of accelerating AI R&D by 10x this year is around 3%. It would be extremely bad for US national interests if such an AI was stolen by foreign actors. This suffices for regulation ensuring very high levels of security IMO. And this is setting aside ongoing IP theft and similar issues.

Matthew_Barnett

Apr 25 2024

Can you explain why you suspect these things should be more regulated than they currently are?

Ryan GreenblattApr 25 202414

In particular, I am persuaded by the argument that, because evaluation is usually easier than generation, it should be feasible to accurately evaluate whether a slightly-smarter-than-human AI is taking unethical actions, allowing us to shape its rewards during training accordingly. After we've aligned a model that's merely slightly smarter than humans, we can use it to help us align even smarter AIs, and so on, plausibly implying that alignment will scale to indefinitely higher levels of intelligence, without necessarily breaking down at any physically realistic point.

This reasoning seems to imply that you could use GPT-2 to oversee GPT-4 by bootstrapping from a chain of models of scales between GPT-2 and GPT-4. However, this isn't true, the weak-to-strong generalization paper finds that this doesn't work and indeed bootstrapping like this doesn't help at all for ChatGPT reward modeling (it helps on chess puzzles and for nothing else they investigate I believe).

I think this sort of bootstrapping argument might work if we could ensure that each model in the chain was sufficiently aligned and capable of reasoning such that it would carefully reason about what humans would want if the... (read more)

Matthew_BarnettFeb 3 202475

I'm curious why there hasn't been more work exploring a pro-AI or pro-AI-acceleration position from an effective altruist perspective. Some points:

Unlike existential risk from other sources (e.g. an asteroid) AI x-risk is unique because humans would be replaced by other beings, rather than completely dying out. This means you can't simply apply a naive argument that AI threatens total extinction of value to make the case that AI safety is astronomically important, in the sense that you can for other x-risks. You generally need additional assumptions.
Total utilitarianism is generally seen as non-speciesist, and therefore has no intrinsic preference for human values over unaligned AI values. If AIs are conscious, there don't appear to be strong prima facie reasons for preferring humans to AIs under hedonistic utilitarianism. Under preference utilitarianism, it doesn't necessarily matter whether AIs are conscious.
Total utilitarianism generally recommends large population sizes. Accelerating AI can be modeled as a kind of "population accelerationism". Extremely large AI populations could be preferable under utilitarianism compared to small human populations, even those with high per-ca

... (read more)

emre kaplan🔸Feb 3 202419

I think a more important reason is the additional value of the information and the option value. It's very likely that the change resulting from AI development will be irreversible. Since we're still able to learn about AI as we study it, taking additional time to think and plan before training the most powerful AI systems seems to reduce the likelihood of being locked into suboptimal outcomes. Increasing the likelihood of achieving "utopia" rather than landing into "mediocrity" by 2 percent seems far more important than speeding up utopia by 10 years.

Matthew_Barnett

Feb 3 2024

I think all actions are in a sense irreversible, but large changes tend to be less reversible than small changes. In this sense, the argument you gave seems reducible to "we should generally delay large changes to the world, to preserve option value". Is that a reasonable summary? In this case I think it's just not obvious that delaying large changes is good. Would it have been good to delay the industrial revolution to preserve option value? I think this heuristic, if used in the past, would have generally demanded that we "pause" all sorts of social, material, and moral progress, which seems wrong.

Michael_PJ

Feb 3 2024

I don't think we would have been able to use the additional information we would have gained from delaying the industrial revolution but I think if we could have the answer might be "yes". It's easy to see in hindsight that it went well overall, but that doesn't mean that the correct ex ante attitude shouldn't have been caution!

kokotajlodFeb 6 202415

My understanding is that relatively few EAs are actual hardcore classic hedonist utilitarians. I think this is ~sufficient to explain why more haven't become accelerationists.
Have you cornered a classic hedonist utilitarian EA and asked them? Have you cornered three? What did they say?

JWS 🔸

Feb 8 2024

Don't know why this is being disagree-voted. I think point 1 is basically correct - it doesn't take diverging far from being a "hardcore classic hedonist utilitarian" to not support the case Matthew makes in the OP

Will AldredFeb 3 202413

AI x-risk is unique because humans would be replaced by other beings, rather than completely dying out. This means you can't simply apply a naive argument that AI threatens total extinction of value

Paul Christiano wrote a piece a few years ago about ensuring that misaligned ASI is a “good successor” (in the moral value sense),^[1] as a plan B to alignment (Medium version; LW version). I agree it’s odd that there hasn’t been more discussion since.^[2]

Here's a non-exhaustive list of guesses for why I think EAs haven't historically been sympathetic [...]: A belief that AIs won't be conscious, and therefore won't have much moral value compared to humans.

I’ve wondered about this myself. My take is that this area was overlooked a year ago, but there’s now some good work being done. See Jeff Sebo’s Nov ‘23 80k podcast episode, as well as Rob Long’s episode, and the paper that the two of them co-authored at the end of last year: “Moral consideration for AI systems by 2030”. Overall, I’m optimistic about this area becoming a new forefront of EA.

accelerationism would have, at best, temporary effects

I’m confused by this point, and for me this is the overriding crux between m... (read more)

Ryan Greenblatt

Feb 3 2024

It's worth emphasizing that moral welfare of digital minds is quite a different (though related) topic to whether AIs are good successors.

Will Aldred

Feb 3 2024

Fair point, I’ve added a footnote to make this clearer.

Ryan GreenblattFeb 3 202412

Under purely longtermist views, accelerating AI by 1 year increases available cosmic resources by 1 part in 10 billion. This is tiny. So the first order effects of acceleration are tiny from a longtermist perspective.

Thus, a purely longtermist perspective doesn't care about the direct effects of delay/acceleration and the question would come down to indirect effects.

I can see indirect effects going either way, but delay seems better on current margins (this might depend on how much optimism you have on current AI safety progress, governance/policy progress, and whether you think humanity retaining control relative to AIs is good or bad). All of these topics have been explored and discussed to some extent.

When focusing on the welfare/preferences of currently existing people, I think it's unclear if accelerating AI looks good or bad, it depends on optimism about AI safety, how you trade-off old people versus young people, and death via violence versus death from old age. (Misaligned AI takeover killing lots of people is by no means assured, but seems reasonably likely by default.)

I expect there hasn't been much investigation of accelerating AI to advance the preferences of currently ... (read more)

Matthew_Barnett

Feb 4 2024

Tiny compared to what? Are you assuming we can take some other action whose consequences don't wash out over the long-term, e.g. because of a value lock-in? In general, these assumptions just seem quite weak and underspecified to me. What exactly is the alternative action that has vastly greater value in expectation, and why does it have greater value? If what you mean is that we can try to reduce the risk of extinction instead, keep in mind that my first bullet point preempted pretty much this exact argument:

Pablo

Feb 4 2024

I think it remains the case that the value of accelerating AI progress is tiny relative to other apparently available interventions, such as ensuring that AIs are sentient or improving their expected well-being conditional on their being sentient. The case for focusing on how a transformative technology unfolds, rather than on when it unfolds,[1] seems robust to a relatively wide range of technologies and assumptions. Still, this seems worth further investigation. 1. ^ Indeed, it seems that when the transformation unfolds is primarily important because of how it unfolds, insofar as the quality of a transformation is partly determined by its timing.

Matthew_Barnett

Feb 4 2024

I'm claiming that it is not actually clear that we can take actions that don't merely wash out over the long-term. In this case, you cannot simply assume that we can meaningfully and predictably affect how valuable the long-term future will be in, for example, billions of years. I agree that, yes, if you assume we can meaningfully affect the very long-run, then all actions that merely have short-term effects will have "tiny" impacts by comparison. But the assumption that we can meaningfully and predictably affect the long-run is precisely the thing that needs to be argued. I think it's important for EAs to try to be more rigorous about their empirical claims here. Moreover, actions that have short-term effects can generally be assumed to have longer term effects if our actions propagate. For example, support for larger population sizes now would presumably increase the probability that larger population sizes exist in the very long run, compared to the alternative of smaller population sizes with high per capita incomes. It seems arbitrary to assume this effect will be negligible but then also assume other competing effects won't be negligible. I don't see any strong arguments for this position.

Pablo

Feb 4 2024

I was trying to hint at prima facie plausible ways in which the present generation can increase the value of the long-term future by more than one part in billions, rather than “assume” that this is the case, though of course I never gave anything resembling a rigorous argument. I do agree that the “washing out” hypothesis is a reasonable default and that one needs a positive reason for expecting our present actions to persist into the long-term. One seemingly plausible mechanism is influencing how a transformative technology unfolds: it seems that the first generation that creates AGI has significantly more influence on how much artificial sentience there is in the universe a trillion years from now than, say, the millionth generation. Do you disagree with this claim? I’m not sure I understand the point you make in the second paragraph. What would be the predictable long-term effects of hastening the arrival of AGI in the short-term?

Matthew_Barnett

Feb 4 2024

As I understand, the argument originally given was that there was a tiny effect of pushing for AI acceleration, which seems outweighed by unnamed and gigantic "indirect" effects in the long-run from alternative strategies of improving the long-run future. I responded by trying to get more clarity on what these gigantic indirect effects actually are, how we can predictably bring them about, and why we would think it's plausible that we could bring them about in the first place. From my perspective, the shape of this argument looks something like: 1. Your action X has this tiny positive near-term effect (ETA: or a tiny direct effect) 2. My action Y has this large positive long-term effect (ETA: or a large indirect effect) 3. Therefore, Y is better than X. Do you see the flaw here? Well, both X and Y could have long-term effects! So, it's not sufficient to compare the short-term effect of X to the long-term effect of Y. You need to compare both effects, on both time horizons. As far as I can tell, I haven't seen any argument in this thread that analyzed and compared the long-term effects in any detail, except perhaps in Ryan Greenblatt original comment, in which he linked to some other comments about a similar topic in a different thread (but I still don't see what the exact argument is). More generally, I think you're probably trying to point to some concept you think is obvious and clear here, and I'm not seeing it, which is why I'm asking you to be more precise and rigorous about what you're actually claiming. In my original comment I pointed towards a mechanism. Here's a more precise characterization of the argument: 1. Total utilitarianism generally supports, all else being equal, larger population sizes with low per capita incomes over small population sizes with high per capita incomes. 2. To the extent that our actions do not "wash out", it seems reasonable to assume that pushing for large population sizes now would make it more likely in the long-ru

Pablo

Feb 5 2024

Thanks for the clarification. Yes, I agree that we should consider the long-term effects of each intervention when comparing them. I focused on the short-term effects of hastening AI progress because it is those effects that are normally cited as the relevant justification in EA/utilitarian discussions of that intervention. For instance, those are the effects that Bostrom considers in ‘Astronomical waste’. Conceivably, there is a separate argument that appeals to the beneficial long-term effects of AI capability acceleration. I haven’t considered this argument because I haven’t seen many people make it, so I assume that accelerationist types tend to believe that the short-term effects dominate.

Matthew_Barnett

Feb 5 2024

I think Bostrom's argument merely compares a pure x-risk (such as a huge asteroid hurtling towards Earth) relative to technological acceleration, and then concludes that reducing the probability of a pure x-risk is more important because the x-risk threatens the eventual colonization of the universe. I agree with this argument in the case of a pure x-risk, but as I noted in my original comment, I don't think that AI risk is a pure x-risk. If, by contrast, all we're doing by doing AI safety research is influencing something like "the values of the agents in society in the future" (and not actually influencing the probability of eventual colonization), then this action seems to plausibly just wash out in the long-term. In this case, it seems very appropriate to compare the short-term effects of AI safety to the short-term effects of acceleration. Let me put it another way. We can think about two (potentially competing) strategies for making the future better, along with their relevant short and possible long-term effects: * Doing AI safety research * Short-term effects: makes it more likely that AIs are kind to current or near-future humans * Possible long-term effect: makes it more likely that AIs in the very long-run will share the values of the human species, relative to some unaligned alternative * Accelerating AI * Short-term effect: helps current humans by hastening the arrival of advanced technology * Possible long-term effect: makes it more likely that we have a large population size at low per capita incomes, relative to a low population size with high per capita income My opinion is that both of these long-term effects are very speculative, so it's generally better to focus on a heuristic of doing what's better in the short-term, while keeping the long-term consequences in mind. And when I do that, I do not come to a strong conclusion that AI safety research "beats" AI acceleration, from a total utilitarian perspective.

Ryan Greenblatt

Feb 4 2024

To be clear, this wasn't the structure of my original argument (though it might be Pablo's). My argument was more like "you seem to be implying that action X is good because of its direct effect (literal first order acceleration), but actually the direct effect is small when considered in a particular perspective (longtermism), so for the that perspective we need to consideer indirect effects and the analysis for that looks pretty different". Note that I wasn't trying really trying argue much about the sign of the indirect effect, though people have indeed discussed this in some detail in various contexts.

Matthew_Barnett

Feb 4 2024

I agree your original argument was slightly different than the form I stated. I was speaking too loosely, and conflated what I thought Pablo might be thinking with what you stated originally. I think the important claim from my comment is "As far as I can tell, I haven't seen any argument in this thread that analyzed and compared the long-term effects in any detail, except perhaps in Ryan Greenblatt original comment, in which he linked to some other comments about a similar topic in a different thread (but I still don't see what the exact argument is)."

Ryan Greenblatt

Feb 4 2024

Explicitly confirming that this seems right to me.

Ryan Greenblatt

Feb 4 2024

I don't disagree with this. I was just claiming that the "indirect" effects dominate (by indirect, I just mean effects other than shifting the future closer in time). There is still the question of indirect/direct effects.

Matthew_Barnett

Feb 4 2024

I understand that. I wanted to know why you thought that. I'm asking for clarity. I don't currently understand your reasons. See this recent comment of mine for more info.

Ryan Greenblatt

Feb 4 2024

(I don't think I'm going to engage further here, sorry.)

Ryan Greenblatt

Feb 4 2024

Ensuring human control throughout the singularity rather than having AIs get control very obviously has relatively massive effects. Of course, we can debate the sign here, I'm just making a claim about the magnitude. I'm not talking about extinction of all smart beings on earth (AIs and humans), which seems like a small fraction of existential risk. (Separately, the badness of such extinction seems maybe somewhat overrated because pretty likely intelligent life will just re-evolve in the next 300 million years. Intelligent life doesn't seem that contingent. Also aliens.)

Matthew_Barnett

Feb 4 2024

For what it's worth, I think my reply to Pablo here responds to your comment fairly adequately too.

ArepoFeb 8 202410

I generally agree that we should be more concerned about this. In particular, I find people who will happily approve Shut Up and Multiply sentiment but reject this consideration suspect in their reasoning.

A more extreme version of this is that, given the massively greater efficiency with which a digital consciousness could convert matter and energy to utilons (IIRC naively about 3 orders of magnitude according to Bostrom, before any increase from greater coordination), on strict expected value reasoning you have to be extremely confident that this won't happen - or at least have a much stronger rebuttal than 'AI won't necessarily be conscious'.

Separately, I think there might be a case for accelerationism even if you think it increases the risk of AI takeover and that AI takeover is bad, on the grounds that in many scenarios advancing faster might still increase the probability of human descendants getting through the time of perils before some other threat destroys us (every year we remain in our current state is another year in which we run the risk of, for example, a global nuclear war or civilisation-ending pandemic).

Vasco Grilo🔸

Feb 11 2024

Hi, I have a post where I conclude the above may well apply not only to digital consciousness, but also to animals:

Ben Millwood🔸

Feb 3 2024

A lot of these points seem like arguments that it's possible that unaligned AI takeover will go well, e.g. there's no reason not to think that AIs are conscious, or will have interesting moral values, or etc. My stance is that we (more-or-less) know humans are conscious and have moral values that, while they have failed to prevent large amounts of harm, seem to have the potential to be good. AIs may be conscious and may have welfare-promoting values, but we don't know that yet. We should try to better understand whether AIs are worthy successors before transitioning power to them. Probably a core point of disagreement here is whether, presented with a "random" intelligent actor, we should expect it to promote welfare or prevent suffering "by default". My understanding is that some accelerationists believe that we should. I believe that we shouldn't. Moreover I believe that it's enough to be substantially uncertain about whether this is or isn't the default to want to take a slower and more careful approach.

Matthew_BarnettFeb 4 202410

My stance is that we (more-or-less) know humans are conscious and have moral values that, while they have failed to prevent large amounts of harm, seem to have the potential to be good.

I claim there's a weird asymmetry here where you're happy to put trust into humans because they have the "potential" to do good, but you're not willing to say the same for AIs, even though they seem to have the same type of "potential".

Whatever your expectations about AIs, we already know that humans are not blank slates that may or may not be altruistic in the future: we actually have a ton of evidence about the quality and character of human nature, and it doesn't make humans look great. Humans are not mainly described as altruistic creatures. I mentioned factory farming in my original comment, but one can examine the way people spend their money (i.e. not mainly on charitable causes), or the history of genocides, war, slavery, and oppression for additional evidence.

Probably a core point of disagreement here is whether, presented with a "random" intelligent actor, we should expect it to promote welfare or prevent suffering "by default".

I don't expect humans to "promote welfare or prevent suffering"... (read more)

Ben Millwood🔸Feb 4 202410

It seems like you're just substantially more pessimistic than I am about humans. I think factory farming will be ended, and though it seems like humans have caused more suffering than happiness so far, I think their default trajectory will be to eventually stop doing that, and to ultimately do enough good to outweigh their ignoble past. I don't think this is certain by any means, but I think it's a reasonable extrapolation. (I maybe don't expect you to find it a reasonable extrapolation.)

Meanwhile I expect the typical unaligned AI may seize power for some purpose that seems to us entirely trivial, and may be uninterested in doing any kind of moral philosophy, and/or may not place any terminal (rather than instrumental) value in paying attention to other sentient experiences in any capacity. I do think humans, even with their kind of terrible track record, are more promising than that baseline, though I can see why other people might think differently.

Ben Millwood🔸

Feb 5 2024

I haven't read your entire post about this, but I understand you believe that if we created aligned AI, it would get essentially "current" human values, rather than e.g. some improved / more enlightened iteration of human values. If instead you believed the latter, that would set a significantly higher bar for unaligned AI, right?

Matthew_Barnett

Feb 5 2024

That's right, if I thought human values would improve greatly in the face of enormous wealth and advanced technology, I'd definitely be open to seeing humans as special and extra valuable from a total utilitarian perspective. Note that many routes through which values could improve in the future could apply to unaligned AIs too. So, for example, I'd need to believe that humans would be more likely to reflect, and be more likely to do the right type of reflection, relative to the unaligned baseline. In other words it's not sufficient to argue that humans would reflect a little bit; that wouldn't really persuade me at all.

elifland

Feb 3 2024

(edit: my point is basically the same as emre's) I think there is very likely at some point going to be some sort of transition to a world where AIs are effectively in control. It seems worth it to slow down on the margin to try to shape this transition as best we can, especially slowing it down as we get closer to AGI and ASI. It would be surprising to me if making the transfer of power more voluntary/careful led to worse outcomes (or only led to slightly better outcomes such that the downsides of slowing down a bit made things worse). Delaying the arrival of AGI by a few years as we get close to it seems good regardless of parameters like the value of involuntary-AI-disempowerment futures. But delaying the arrival by 100s of years seems more likely bad due to the tradeoff with other risks.

Matthew_Barnett

Feb 3 2024

Two questions here: 1. Why would accelerating AI make the transition less voluntary? (In my own mind, I'd be inclined to reverse this sentiment a bit: delaying AI by regulation generally involves forcibly stopping people from adopting AI. Force might be justified if it brings about a greater good, but that's not the argument here.) 2. I can understand being "careful". Being careful does seem like a good thing. But "being careful" generally trades off against other values in almost every domain I can think of, and there is such a thing as too much of a good thing. What reason is there to think that pushing for "more caution" is better on the margin compared to acceleration, especially considering society's default response to AI in the absence of intervention?

elifland

Feb 3 2024

1. So in the multi-agent slowly-replacing case, I'd argue that individual decisions don't necessarily represent a voluntary decision on behalf of society (I'm imagining something like this scenario). In the misaligned power-seeking case, it seems obvious to me that this is involuntary. I agree that it technically could be a collective voluntary decision to hand over power more quickly, though (and in that case I'd be somewhat less against it). 2. I think emre's comment lays out the intuitive case for being careful / taking your time, as does Ryan's. I think the empirics are a bit messy once you take into account benefits of preventing other risks but I'd guess they come out in favor of delaying by at least a few years.

Ryan Greenblatt

Feb 3 2024

I don't think this is a crux. Even if you prefer unaligned AI values over likely human values (weighted by power), you'd probably prefer doing research on further improving AI values over speeding things up.

Isaac Dunn

Feb 3 2024

I think misaligned AI values should be expected to be worse than human values, because it's not clear that misaligned AI systems would care about eg their own welfare. Inasmuch as we expect misaligned AI systems to be conscious (or whatever we need to care about them) and also to be good at looking after their own interests, I agree that it's not clear from a total utilitarian perspective that the outcome would be bad. But the "values" of a misaligned AI system could be pretty arbitrary, so I don't think we should expect that.

JWS 🔸

Feb 8 2024

So I think it's likely you have some very different beliefs from most people/EAs/myself, particularly: 1. Thinking that humans/humanity is bad, and AI is likely to be better 2. Thinking that humanity isn't driven by ideational/moral concerns[1] 3. That AI is very likely to be conscious, moral (as in, making better moral judgements than humans), and that the current/default trend in the industry is very likely to make them conscious moral agents in a way humans aren't I don't know if the total utilitarian/accelerationist position in the OP is yours or not. I think Daniel is right that most EAs don't have this position. I think maybe Peter Singer gets closest to this in his interview with Tyler on the 'would you side with the Aliens or not question' here. But the answer to your descriptive question is simply that most EAs don't have the combination of moral and empirical views about the world to make the argument you present valid and sound, so that's why there isn't much talk in EA about naïve accelerationism. Going off the vibe I get from this view though, I think it's a good heuristic that if your moral view sounds like a movie villain's monologue it might be worth reflecting, and a lot of this post reminded me of the Earth-Trisolaris Organisation from Cixin Liu's Three Body Problem. If someone's honest moral view is "Eliminate human tyranny! The world belongs to Trisolaris AIs!" then I don't know what else there is to do except quote Zvi's phrase "please speak directly into this microphone". Another big issue I have with this post is that some of the counter-arguments just seem a bit like 'nu-uh', see: These (and other examples) are considerations for sure, but they need to be argued for. I don't think they can just be stated and then say "therefore, ACCELERATE!". I agree that AI Safety research needs to be more robust and the philosophical assumptions and views made more explicit, but one could already think of some counters to the questions that you rai

Matthew_Barnett

Feb 8 2024

I don't think humanity is bad. I just think people are selfish, and generally driven by motives that look very different from impartial total utilitarianism. AIs (even potentially "random" ones) seem about as good in expectation, from an impartial standpoint. In my opinion, this view becomes even stronger if you recognize that AIs will be selected on the basis of how helpful, kind, and useful they are to users. (Perhaps notice how different this selection criteria is from the evolutionary criteria used to evolve humans.) I understand that most people are partial to humanity, which is why they generally find my view repugnant. But my response to this perspective is to point out that if we're going to be partial to a group on the basis of something other than utilitarian equal consideration of interests, it makes little sense to choose to be partial to the human species as opposed to the current generation of humans or even myself. And if we take this route, accelerationism seems even more strongly supported than before, since developing AI and accelerating technological progress seems to be the best chance we have of preserving the current generation against aging and death. If we all died, and a new generation of humans replaced us, that would certainly be pretty bad for us. Which sounds more like a movie villain's monologue? * The idea that everyone currently living needs to sacrificed, and die, in order to preserve the human species * The idea that we should try to preserve currently living people, even if that means taking on a greater risk of not preserving the values of the human species To be clear, I also just totally disagree with the heuristic that "if your moral view sounds like a movie villain's monologue it might be worth reflecting". I don't think that fiction is generally a great place for learning moral philosophy, albeit with some notable exceptions. Anyway, the answer to these moral questions may seem obvious to you, but I don't think they'r

Ryan Greenblatt

Feb 8 2024

This is not why people disagree IMO.

Matthew_Barnett

Feb 8 2024

I think the fact that people are partial to humanity explains a large fraction of the disagreement people have with me. But, fair enough, I exaggerated a bit. My true belief is a more moderate version of that claim. When discussing why EAs in particular disagree with me, to overgeneralize by a fair bit, I've noticed that EAs are happy to concede that AIs could be moral patients, but are generally reluctant to admit AIs as moral agents, in the way they'd be happy to accept humans as independent moral agents (e.g. newborns) into our society. I'd call this "being partial to humanity", or at least, "being partial to the values of the human species". (In my opinion, this partiality seems so prevalent and deep in most people that to deny it seems a bit like a fish denying the existence of water. But I digress.) To test this hypothesis, I recently asked three questions on Twitter about whether people would be willing to accept immigration through a portal to another universe from three sources: * "a society of humans who are very similar to us" * "a society of people who look & act like humans, but each of them only cares about their family" * "a society of people who look & act like humans, but they only care about maximizing paperclips" I emphasized that in each case, the people are human-level in their intelligence, and also biological. The results are preliminary (and I'm not linking here to avoid biasing the results, as voting has not yet finished), but so far my followers, who are mostly EAs, are much more happy to let the humans immigrate to our world, compared to the last two options. I claim there just aren't really any defensible reasons to maintain this choice other than by implicitly appealing to a partiality towards humanity. My guess is that if people are asked to defend their choice explicitly, they'd largely talk about some inherent altruism or hope they place in the human species, relative to the other options; and this still looks like "being p

Ryan GreenblattFeb 9 202411

I think the fact that people are partial to humanity explains a large fraction of the disagreement people have with me.

Maybe, it's hard for me to know. But I predict most the pushback you're getting from relatively thoughtful longtermists isn't due to this.

I've noticed that EAs are happy to concede that AIs could be moral patients, but are generally reluctant to admit AIs as moral agents, in the way they'd be happy to accept humans as independent moral agents (e.g. newborns) into our society.

I agree with this.

I'd call this "being partial to humanity", or at least, "being partial to the values of the human species".

I think "being partial to humanity" is a bad description of what's going on because (e.g.) these same people would be considerably more on board with aliens. I think the main thing going on is that people have some (probably mistaken) levels of pessimism about how AIs would act as moral agents which they don't have about (e.g.) aliens.

To test this hypothesis, I recently asked three questions on Twitter about whether people would be willing to accept immigration through a portal to another universe from three sources:
"a society of humans who are very similar to us"
"a

... (read more)

Ryan Greenblatt

Feb 9 2024

To be clear, it seems totally reasonable to call this "being partial to some notion of moral thoughtfulness about pain, pleasure, and preferences", but these concepts don't seem that "human" to me. (I predict these occur pretty frequently in evolved life that reaches a singularity for instance. And they might occur in AIs, but I expect misaligned AIs which seize control of the world are worse from my perspective than if humans retain control.)

Matthew_Barnett

Feb 9 2024

When I say that people are partial to humanity, I'm including an irrational bias towards thinking that humans, or evolved beings, are unusually thoughtful or ethical compared to the alternatives (I believe this is in fact an irrational bias, since the arguments I've seen for thinking that unaligned AIs will be less thoughtful or ethical than aliens seem very weak to me). In other cases, when people irrationally hold a certain group X to a higher standard than a group Y, it is routinely described as "being partial to group Y over group X". I think this is just what "being partial" means, in an ordinary sense, across a wide range of cases. For example, if I proposed aligning AI to my local friend group, with the explicit justification that I thought my friends are unusually thoughtful, I think this would be well-described as me being "partial" to my friend group. To the extent you're seeing me as saying something else about how longtermists view the argument, I suspect you're reading me as saying something stronger than what I originally intended.

Ryan Greenblatt

Feb 9 2024

In that case, my main disagreement is thinking that your twitter poll is evidence for your claims. More specifically: Like you claim there aren't any defensible reasons to think that what humans will do is better than literally maximizing paper clips? This seems totally wild to me.

Matthew_Barnett

Feb 9 2024

I'm not exactly sure what you mean by this. There were three options, and human paperclippers were only one of these options. I was mainly discussing the choice between (1) and (2) in the comment, not between (1) and (3). Here's my best guess at what you're saying: it sounds like you're repeating that you expect humans to be unusually altruistic or thoughtful compared to an unaligned alternative. But the point of my previous comment was to state my view that this bias counted as "being partial towards humanity", since I view the bias as irrational. In light of that, what part of my comment are you objecting to? To be clear, you can think the bias I'm talking about is actually rational; that's fine. But I just disagree with you for pretty mundane reasons. [Incorporating what you said in the other comment] Then I think it's worth concretely explaining what these reasons are to believe that human control will be a decent amount better in expectation. You don't need to write this up yourself, of course. I think the EA community should write these reasons up. Because I currently view the proposition as non-obvious, and despite being a critical belief in AI risk discussions, it's usually asserted without argument. When I've pressed people in the past, they typically give very weak reasons. I don't know how to respond to an argument whose details are omitted.

Ryan Greenblatt

Feb 10 2024

+1, but I don't generally think it's worth counting on "the EA community" to do something like this. I've been vaguely trying to pitch Joe on doing something like this (though there are probably better uses of his time) and his recent blogs posts are touching similar topics.

Ryan Greenblatt

Feb 10 2024

Also, it's usually only the crux of longtermists which is probably one of the reasons why no one has gotten around to this.

Ryan Greenblatt

Feb 10 2024

You didn't make this clear, so was just responding generically. Separately, I think I feel a pretty similar intution for case (2), people literally only caring about their families seems pretty clearly worse.

Ryan Greenblatt

Feb 10 2024

There, I'm just saying that human control is better than literal paperclip maximization.

Matthew_Barnett

Feb 10 2024

This response still seems underspecified to me. Is the default unaligned alternative paperclip maximization in your view? I understand that Eliezer Yudkowsky has given arguments for this position, but it seems like you diverge significantly from Eliezer's general worldview, so I'd still prefer to hear this take spelled out in more detail from your own point of view.

Ryan Greenblatt

Feb 10 2024

Your poll says: And then you say: So, I think more human control is better than more literal paperclip maximization, the option given in your poll. My overall position isn't that the AIs will certainly be paperclippers, I'm just arguing in isolation about why I think the choice given in the poll is defensible.

Matthew_Barnett

Feb 10 2024

I have the feeling we're talking past each other a bit. I suspect talking about this poll was kind of a distraction. I personally have the sense of trying to convey a central point, and instead of getting the point across, I feel the conversation keeps slipping into talking about how to interpret minor things I said, which I don't see as very relevant. I will probably take a break from replying for now, for these reasons, although I'd be happy to catch up some time and maybe have a call to discuss these questions in more depth. I definitely see you as trying a lot harder than most other EAs in trying to make progress on these questions collaboratively with me.

JWS 🔸

Feb 10 2024

I'd be very happy to have some discussion on these topics with you Matthew. For what it's worth, I really have found much of your work insightful, thought-provoking, and valuable. I think I just have some strong, core disagreements on multiple empirical/epistemological/moral levels with your latest series of posts. That doesn't mean I don't want you to share your views, or that they're not worth discussion, and I apologise if I came off as too hostile. An open invitation to have some kind of deeper discussion stands.[1] 1. ^ I'd like to try out the new dialogue feature on the Forum, but that's a weak preference

Ryan Greenblatt

Feb 10 2024

Agreed, sorry about that.

Ryan Greenblatt

Feb 9 2024

Also, to be clear, I agree that the question of "how much worse/better is it for AIs to get vast amounts of resources without human society intending to grant those resources to the AIs from a longtermist perspective" is underinvestigated, but I think there are pretty good reasons to systematically expect human control to be a decent amount better.

4[anonymous]Feb 3 2024

Strongly there should be more explicit defences of this argument. One way of doing this in a co-operative way might working on co-operative AI stuff, since it seems to increase the likelihood that misaligned AI goes well, or at least less badly.

Robi Rahman

Feb 5 2024

I'm guessing preference utilitarians would typically say that only the preferences of conscious entities matter. I doubt any of them would care about satisfying an electron's "preference" to be near protons rather than ionized.

Matthew_Barnett

Feb 5 2024

Perhaps. I don't know what most preference utilitarians believe. Are you familiar with Brian Tomasik? (He's written about suffering of fundamental particles, and also defended preference utilitarianism.)

Pivocajs

Feb 7 2024

My personal reason for not digging into this is that my naive model of how good the AI future is: quality_of_future * amount_of_the_stuff. And there is distinction I haven't seen you acknowledged: while high "quality" doesn't require humans to be around, I ultimately judge quality by my values. (Thing being conscious is an example. But this also includes things like not copy-pasting the same thing all over, not wiping out aliens, and presumably many other things I am not aware of. IIRC Yudkowsky talks about cosmopolitanism being a human value.) Because of this, my impression is that if we hand over the future to a random AI, the "quality" will be very low. And so we can currently have a much larger impact by focusing on increasing the quality. Which we can do by delaying "handing over the future to AI" and picking a good AI to hand over to. IE, alignment. (Still, I agree it would be nice if there was a better analysis of this, which exposed the assumptions.)

Matthew_Barnett

Feb 7 2024

Is there any particular reason why you are partial towards humans generically controlling the future, relative to this particular current generation of humans? To me, it seems like being partial to one's own values, one's community, and especially one's own life, generally leads to an even stronger argument for accelerationism, since the best way to advance your own values is generally to actually "be there" when AI happens. In my opinion, the main relevant alternative to this view is to be partial to the human species, as opposed to being partial to either one's current generation, or oneself. And I think the human species is kind of a weird category to be partial to, relative to those other things. Do you disagree?

Pivocajs

Feb 21 2024

I agree with this. I (strongly) disagree with this. Me being alive is a relatively small part of my values. And since I am not the director of the world, me personally being around to influence things is unlikely to have a decisive impact on things I value. In more detail: Sure, all else being equal, me being there when AI happens is mildly helpful. But the outcome of building AI seems to be a function of, among other things, (i) values of the people building it + (ii) how much reflection they can do on those values + (iii) the environment dynamics these people are subject to (e.g., the current race dynamics between AI companies). And over time, I expect the potential decrease in (i) to be far outweighed by gains in (ii) and (iii). * The first issue is about (i), that it is not actually me building the AGI, either now or in the future. But I am willing to grant that (all else being equal) current generation is more likely to have values closer to my values. * However, I expect that the factors are (ii) and (iii) are just as influential. Regarding (ii), it seems we keep making progress at philosophy, ethics, etc, and to me, this currently far outweighs the value drift in (i). * Regarding (iii), my impression is that the current situation is so bad that it can't get much worse, and we might as well wait. This of course depends on how likely you think we are likely to get a bad outcome if we either (a) get superintelligence without additional progress on alignment or (b) get widespread human-level AI with no progress on alignment, institution design, etc.

Matthew_Barnett

Feb 21 2024

I agree some people (such as yourself) might be extremely altruistic, and therefore might not care much about their own life relative to other values they hold, but this position is fairly uncommon. Most people care a lot about their own lives (and especially the lives of their family and friends) relative to other things they care about. We can empirically test this hypothesis by looking at how people choose to spend their time and money; and the results are generally that people spend their money on themselves, their family and their friends. You don't need to be director of the world to have influence over things. You can just be a small part of the world to have influence over things that you care about. This is essentially what you're already doing by living and using your income to make decisions, to satisfy your own preferences. I'm claiming this situation could and probably will persist into the indefinite future, for the agents that exist in the future. I'm very skeptical that there will ever be a moment in time during which there will be a "director of the world", in a strong sense. And I doubt the developer of the first AGI will become the director of the world, even remotely (including versions of them that reflect on moral philosophy etc.). You might want to read my post about this.

Vasco Grilo🔸

Feb 11 2024

Great points, Matthew! I have wondered about this too. Relatedly, readers may want to check the sequence otherness and control in the age of AGI from Joe Carlsmith, in particular, Does AI risk “other” the AIs?. One potential argument against accelerating AI is that it will increase the chance of catastrophes which will then lead to overregulating AI (e.g. in the same way that nuclear power arguably was overregulated).

Matthew_BarnettDec 31 202417

It is becoming increasingly clear to many people that the term "AGI" is vague and should often be replaced with more precise terminology. My hope is that people will soon recognize that other commonly used terms, such as "superintelligence," "aligned AI," "power-seeking AI," and "schemer," suffer from similar issues of ambiguity and imprecision, and should also be approached with greater care or replaced with clearer alternatives.

To start with, the term "superintelligence" is vague because it encompasses an extremely broad range of capabilities above human intelligence. The differences within this range can be immense. For instance, a hypothetical system at the level of "GPT-8" would represent a very different level of capability compared to something like a "Jupiter brain", i.e., an AI with the computing power of an entire gas giant. When people discuss "what a superintelligence can do" the lack of clarity around which level of capability they are referring to creates significant confusion. The term lumps together entities with drastically different abilities, leading to oversimplified or misleading conclusions.

Similarly, "aligned AI" is an ambiguous term because it means differen... (read more)

Michael Noetel 🔸

Feb 21

How do you feel about this framework?

Ozzie Gooen

Jan 1

I agree, I've also been thinking about this. I think there's a great deal of interesting work here, to try to put together better terminology. My guess is that it would be difficult to change all dialogue using this vocabulary anytime soon, but even shifting some of the research dialogue could go a long way.

5[anonymous]Jan 2

I worked in advertising agencies for almost a decade. People there complain about terminology too. But it never gets fixed because that's not how linguistics / culture works. This is an intractable problem and only useful for insiders who feel like venting.

Matthew_Barnett

Jan 2

Most analytic philosophers, lawyers, and scientists have converged on linguistic norms that are substantially more precise than the informal terminology employed by LessWrong-style speculation about AI alignment. So this is clearly not an intractable problem; otherwise these people in other professions could not have made their language more precise. Rather, success depends on incentives and the willingness of people within the field to be more rigorous.

Habryka [Deactivated]Jan 312

I don't think this is true, or at least I think you are misrepresenting the tradeoffs and diversity here. There is some publication bias here because people are more precise in papers, but honestly, scientists are also not more precise than many top LW posts in the discussion section of their papers, especially when covering wider-ranging topics.

Predictive coding papers use language incredibly imprecisely, analytic philosophy often uses words in really confusing and inconsistent ways, economists (especially macroeconomists) throw out various terms in quite imprecise ways.

But also, as soon as you leave the context of official publications, but are instead looking at lectures, or books, or private letters, you will see people use language much less precisely, and those contexts are where a lot of the relevant intellectual work happens. Especially when scientists start talking about the kind of stuff that LW likes to talk about, like intelligence and philosophy of science, there is much less rigor (and also, I recommend people read a human's guide to words as a general set of arguments for why "precise definitions" are really not viable as a constraint on language)

2[comment deleted]Jan 2

Matthew_BarnettOct 13 202344

AI safety

I might elaborate on this at some point, but I thought I'd write down some general reasons why I'm more optimistic than many EAs on the risk of human extinction from AI. I'm not defending these reasons here; I'm mostly just stating them.

Skepticism of foom: I think it's unlikely that a single AI will take over the whole world and impose its will on everyone else. I think it's more likely that millions of AIs will be competing for control over the world, in a similar way that millions of humans are currently competing for control over the world. Power or wealth might be very unequally distributed in the future, but I find it unlikely that it will be distributed so unequally that there will be only one relevant entity with power. In a non-foomy world, AIs will be constrained by norms and laws. Absent severe misalignment among almost all the AIs, I think these norms and laws will likely include a general prohibition on murdering humans, and there won't be a particularly strong motive for AIs to murder every human either.
Skepticism that value alignment is super-hard: I haven't seen any strong arguments that value alignment is very hard, in contrast to the straightforward empirical evide

... (read more)

RobertMOct 14 202313

ETA: feel free to ignore the below, given your caveat, though you may find it helpful if you choose to write an expanded form of any of the arguments later to have some early objections.

Correct me if I'm wrong, but it seems like most of these reasons boil down to not expecting AI to be superhuman in any relevant sense (since if it is, effectively all of them break down as reasons for optimism)? To wit:

Resource allocation is relatively equal (and relatively free of violence) among humans because even humans that don't very much value the well-being of others don't have the power to actually expropriate everyone else's resources by force. (We have evidence of what happens when those conditions break down to any meaningful degree; it isn't super pretty.)
I do not think GPT-4 is meaningful evidence about the difficulty of value alignment. In particular, the claim that "GPT-4 seems to be honest, kind, and helpful after relatively little effort" seems to be treating GPT-4's behavior as meaningfully reflecting its internal preferences or motivations, which I think is "not even wrong". I think it's extremely unlikely that GPT-4 has preferences over world states in a

... (read more)

Matthew_BarnettOct 14 202312

Correct me if I'm wrong, but it seems like most of these reasons boil down to not expecting AI to be superhuman in any relevant sense

No, I certainly expect AIs will eventually be superhuman in virtually all relevant respects.

Resource allocation is relatively equal (and relatively free of violence) among humans because even humans that don't very much value the well-being of others don't have the power to actually expropriate everyone else's resources by force.

Can you clarify what you are saying here? If I understand you correctly, you're saying that humans have relatively little wealth inequality because there's relatively little inequality in power between humans. What does that imply about AI?

I think there will probably be big inequalities in power among AIs, but I am skeptical of the view that there will be only one (or even a few) AIs that dominate over everything else.

I do not think GPT-4 is meaningful evidence about the difficulty of value alignment.

I'm curious: does that mean you also think that alignment research performed on GPT-4 is essentially worthless? If not, why?

I think it's extremely unlikely that GPT-4 has preferences over world states in a way that most humans wou

... (read more)

SiebeRozendal

Oct 14 2023

Okay so these are two analogies: individual humans & groups/countries. First off, "surviving" doesn't seem like the right thing to evaluate, more like "significant harm"/"being exploited " Can you give some examples where individual humans have a clear strategic decisive advantage (i.e. very low risk of punishment), where the low-power individual isn't at a high risk of serious harm? Because the examples I can think of are all pretty bad: dictators, slaveholders, husbands in highly patriarchal societies.. Sexual violence is extremely prevalent and is pretty much always in a high power difference context. I find the US example unconvincing, because I find it hard to imagine the US benefiting more from aggressive use it force, than trade and soft economic exploitation. The US doesn't have the power to successfully occupy countries anymore. When there were bigger power differences due to technology, we had the age of colonialism.

Matthew_Barnett

Oct 14 2023

Why are we assuming a low risk of punishment? Risk of punishment depends largely on social norms and laws, and I'm saying that AIs will likely adhere to a set of social norms. I think the central question is whether these social norms will include the norm "don't murder humans". I think such a norm will probably exist, unless almost all AIs are severely misaligned. I think severe misalignment is possible; one can certainly imagine it happening. But I don't find it likely, since people will care a lot about making AIs ethical, and I'm not yet aware of any strong reasons to think alignment will be super-hard.

Matthew_BarnettSep 21 202345

(Clarification about my views in the context of the AI pause debate)

I'm finding it hard to communicate my views on AI risk. I feel like some people are responding to the general vibe they think I'm giving off rather than the actual content. Other times, it seems like people will focus on a narrow snippet of my comments/post and respond to it without recognizing the context. For example, one person interpreted me as saying that I'm against literally any AI safety regulation. I'm not.

For a full disclosure, my views on AI risk can be loosely summarized as follows:

I think AI will probably be very beneficial for humanity.
Nonetheless, I think that there are credible, foreseeable risks from AI that could do vast harm, and we should invest heavily to ensure these outcomes don't happen.
I also don't think technology is uniformly harmless. Plenty of technologies have caused net harm. Factory farming is a giant net harm that might have even made our entire industrial civilization a mistake!
I'm not blindly against regulation. I think all laws can and should be viewed as forms of regulations, and I don't think it's feasible for society to exist without laws.
That said, I'm also not blindly in fav

... (read more)

Chris Leong

Sep 21 2023

Thanks, that seems like a pretty useful summary.

Matthew_BarnettFeb 4 202434

It seems to me that a big crux about the value of AI alignment work is what target you think AIs will ultimately be aligned to in the future in the optimistic scenario where we solve all the "core" AI risk problems to the extent they can be feasibly solved, e.g. technical AI safety problems, coordination problems, the problem of having "good" AI developers in charge etc.

There are a few targets that I've seen people predict AIs will be aligned to if we solve these problems: (1) "human values", (2) benevolent moral values, (3) the values of AI developers, (4) the CEV of humanity, (5) the government's values. My guess is that a significant source of disagreement that I have with EAs about AI risk is that I think none of these answers are actually very plausible. I've written a few posts explaining my views on this question already (1, 2), but I think I probably didn't make some of my points clear enough in these posts. So let me try again.

In my view, in the most likely case, it seems that if the "core" AI risk problems are solved, AIs will be aligned to the primarily selfish individual revealed preferences of existing humans at the time of alignment. This essentially refers to the the... (read more)

MichaelStJules

Feb 4 2024

EDIT: I guess I'd think of human values as what people would actually just sincerely and directly endorse without further influencing them first (although maybe just asking them makes them take a position if they didn't have one before, e.g. if they've never thought much about the ethics of eating meat). I think you're overstating the differences between revealed and endorsed preferences, including moral/human values, here. Probably only a small share of the population thinks eating meat is wrong or bad, and most probably think it's okay. Even if people generally would find it wrong or bad after reflecting long enough (I'm not sure they actually would), that doesn't reflect their actual values now. Actual human values do not generally find eating meat wrong. To be clear, you can still complain that humans' actual/endorsed values are also far from ideal and maybe not worth aligning with, e.g. because people don't care enough about nonhuman animals or helping others. Do people care more about animals and helping others than an unaligned AI would, in expectation, though? Honestly, I'm not entirely sure. Humans may care about animal welfare somewhat, but they also specifically want to exploit animals in large part because of their values, specifically food-related taste, culture, traditions and habit. Maybe people will also want to specifically exploit artificial moral patients for their own entertainment, curiosity or scientific research on them, not just because the artificial moral patients are generically useful, e.g. for acquiring resources and power and enacting preferences (which an unaligned AI could be prone to).

MichaelStJules

Feb 4 2024

I illustrate some other examples here on the influence of human moral values on companies. This is all of course revealed preferences, but my point is that revealed preferences can importantly reflect endorsed moral values. People influence companies in part on the basis of what they think is right through demand, boycotts, law, regulation and other political pressure. Companies, for the most part, can't just go around directly murdering people (companies can still harm people, e.g. through misinformation on the health risks of their products, or because people don't care enough about the harms). (Maybe this is largely for selfish reasons; people don't want to be killed themselves, and there's a slippery slope if you allow exceptions.) GPT has content policies that reflect people's political/moral views. Social media companies have use and content policies and have kicked off various users for harassment, racism, or other things that are politically unpopular, at least among a large share of users or advertisers (which also reflect consumers). This seems pretty standard. Many companies have boycotted Russia since the invasion of Ukraine. Many companies have also committed to sourcing only cage-free eggs after corporate outreach and campaigns, despite cage-free egg consumption being low. X (Twitter)'s policies on hate speech have changed under Musk, presumably primarily because of his views. That seems to have cost X users and advertisers, but X is still around and popular, so it also shows that some potentially important decisions about how a technology is used are largely in the hands of the company and its leadership, not just driven by profit. I'd likewise guess it actually makes a difference that the biggest AI labs are (I would assume) led and staffed primarily by liberals. They can push their own views onto their AI even at the cost of some profit and market share. And some things may have minimal near term consequences for demand or profit, but could be

Ryan Greenblatt

Feb 4 2024

I basically buy that the values we get will be similar to just giving existing humans massive amounts of wealth, but I'm less sold that this will result in outcomes which are well described as "primarily selfish". I feel like your comment is equivocating between "the situation is similar to making existing humans massively wealth" and "of course this will result in primarily selfish usage similar to how the median person behaves with marginal money now".

Matthew_Barnett

Feb 4 2024

Current humans definitely seem primarily selfish (although I think they also care about their family and friends too; I'm including that). Can you explain why you think giving humans a lot of wealth would turn them into something that isn't primarily selfish? What's the empirical evidence for that idea?

Ryan Greenblatt

Feb 4 2024

The behavior of billionares, which maybe indicates more like 10% of income spent on altruism. ETA: This is still literally majority selfish, but it's also plausible that 10% altruism is pretty great and looks pretty different than "current median person behavior with marginal money". (See my other comment about the percent of cosmic resources.)

Matthew_Barnett

Feb 4 2024

The idea that billionaires have 90% selfish values seems consistent with a claim of having "primarily selfish" values in my opinion. Can you clarify what you're objecting to here?

Ryan Greenblatt

Feb 4 2024

The literal words of "primarily selfish" don't seem that bad, but I would maybe prefer majority selfish? And your top level comment seems like it's not talking about/emphasizing the main reason to like human control which is that maybe 10-20% of resources are spent well. It just seemed odd to me to not mention that "primarily selfish" still involves a pretty big fraction of altruism.

Matthew_Barnett

Feb 4 2024

I agree it's important to talk about and analyze the (relatively small) component of human values that are altruistic. I mostly just think this component is already over-emphasized. Here's one guess at what I think you might be missing about my argument: 90% selfish values + 10% altruistic values isn't the same thing as, e.g., 90% valueless stuff + 10% utopia. The 90% selfish component can have negative effects on welfare from a total utilitarian perspective, that aren't necessarily outweighed by the 10%. 90% selfish values is the type of thing that produces massive factory farming infrastructure, with a small amount of GDP spent mitigating suffering in factory farms. Does the small amount of spending mitigating suffering outweigh the large amount of spending directly causing suffering? This isn't clear to me. (Alternatively, you could think that unaligned AIs will be 100% selfish, and this is clearly worse. But I'd want to understand how you could come to that conclusion, carefully. "Altruism" also encompasses a broad range of activities, and not all of it is utopian or idealistic from a total utilitarian perspective. For example, human spending on environmental conservation might be categorized as "altruism" in this framework, although personally I would say that form of spending is not very "moral" due to wild animal suffering.)

Ryan Greenblatt

Feb 4 2024

Yep, this can be true, but I'm skeptical this will matter much in practice. I typically think things which aren't directly optimizing for value or disvalue won't have intended effects which are very important and that in the future unintended effects (externalities) won't be that much of total value/disvalue. When we see the selfish consumption of current very rich people, it doesn't seem like the intentional effects are that morally good/bad relative to the best/worst uses of resources. (E.g. owning a large boat and having people think you're high status aren't that morally important relative to altruistic spending of similar amounts of money.) So for current very rich people the main issue would be that the economic process for producing the goods has bad externalities. And, I expect that as technology advances, externalities reduce in moral importance relative to intended effects. Partially this is based on crazy transhumanist takes, but I feel like there is some broader perspective in which you'd expect this. E.g. for factory farming, the ultimately cheapest way to make meat in the limit of technological maturity would very likely not involve any animal suffering. Separately, I think externalities will probably look pretty similar for selfish resource usage for unaligned AIs and humans because most serious economic activities will be pretty similar.

Ryan Greenblatt

Feb 4 2024

I'd like to explicitly note that this I don't think that this is true in expectation for a reasonable notion of "selfish". Though I maybe think something which is sort of in this direction if we use a relatively narrow notion of altruism.

JWS 🔸

Feb 8 2024

How are we defining selfish here? It seem like a pretty strong position to take on the topic of psychological egoism? Especially including family/friends in terms of selfish? In your original post, you say: But I don't know, it seems that as countries and individuals get wealthier, we seem to on the whole be getting better? Maybe factory farming acts against this, but the idea that factory farming is immoral and should be abolished exists and I think is only going to grow. I don't think the humans are just slaves to our base wants/desires, and think that is a remarkably impoverished view of both individual human pyschology and social morality. As such, I don't really agree with much of this post. An AGI, when built, will be able to generate new ideas and hypotheses about the world, including moral ones. A strong-but-narrow AI could be worse (e.g. optimal-factory-farm-PT), but then the right response here isn't really technical alignment, it's AI governance and moral persuasion in general.

aog

Feb 9 2024

This seems to underrate the arguments for Malthusian competition in the long run. If we develop the technical capability to align AI systems with any conceivable goal, we'll start by aligning them with our own preferences. Some people are saints, and they'll make omnibenevolent AIs. Other people might have more sinister plans for their AIs. The world will remain full of human values, with all the good and bad that entails. But current human values are do not maximize our reproductive fitness. Maybe one human will start a cult devoted to sending self-replicating AI probes to the stars at almost light speed. That person's values will influence far-reaching corners of the universe that later humans will struggle to reach. Another human might use their AI to persuade others to join together and fight a war of conquest against a smaller, weaker group of enemies. If they win, their prize will be hardware, software, energy, and more power that they can use to continue to spread their values. Even if most humans are not interested in maximizing the number and power of their descendants, those who are will have the most numerous and most powerful descendants. This selection pressure exists even if the humans involved are ignorant of it; even if they actively try to avoid it. I think it's worth splitting the alignment problem into two quite distinct problems: 1. The technical problem of intent alignment. Solving this does not solve coordination problems. There will still be private information and coordination problems after intent alignment is solved, therefore we'll still face coordination problems, fitter strategies will proliferate, and the world will be governed by values that maximize fitness. 2. "Civilizational alignment"? Much harder problem to solve. The traditional answer is a Leviathan, or Singleton as the cool kids have been saying. It solves coordination problems, allowing society to coherently pursue a long-run objective such as flourishing rather

Matthew_Barnett

Feb 9 2024

I'm mostly talking about what I expect to happen in the short-run in this thread. But I appreciate these arguments (and agree with most of them). Plausibly my main disagreement with the concerns you raised is that I think coordination is maybe not very hard. Coordination seems to have gotten stronger over time, in the long-run. AI could also potentially make coordination much easier. As Bostrom has pointed out, historical trends point towards the creation of a Singleton. I'm currently uncertain about whether to be more worried about a future world government becoming stagnant and inflexible. There's a real risk that our institutions will at some point entrench an anti-innovation doctrine that prevents meaningful changes over very long time horizons out of a fear that any evolution would be too risky. As of right now I'm more worried about this potential failure mode versus the failure mode of unrestrained evolution, but it's a close competition between the two concerns.

Ryan Greenblatt

Feb 4 2024

What percent of cosmic resources do you expect to be spent thoughtfully and altruistically? 0%? 10%? I would guess the thoughtful and altruistic subset of resources dominate in most scenarios where humans retain control. Then, my main argument for why human control would be good is that the fraction isn't that small (more like 20% in expectation than 0%) and that unaligned AI takeover seems probably worse than this. Also, as an aside, I agree that little good public argumentation has been made about the relative value of unaligned AI control vs human control. I'm sympathetic to various discussion from Paul Christiano and Joe Carlsmith, but the public scope and detail is pretty limited thus far.

Matthew_BarnettFeb 24 202432

In some circles that I frequent, I've gotten the impression that a decent fraction of existing rhetoric around AI has gotten pretty emotionally charged. And I'm worried about the presence of what I perceive as demagoguery regarding the merits of AI capabilities and AI safety. Out of a desire to avoid calling out specific people or statements, I'll just discuss a hypothetical example for now.

Suppose an EA says, "I'm against OpenAI's strategy for straightforward reasons: OpenAI is selfishly gambling everyone's life in a dark gamble to make themselves immortal." Would this be a true, non-misleading statement? Would this statement likely convey the speaker's genuine beliefs about why they think OpenAI's strategy is bad for the world?

To begin to answer these questions, we can consider the following observations:

It seems likely that AI powerful enough to end the world would presumably also be powerful enough to do lots of incredibly positive things, such as reducing global mortality and curing diseases. By delaying AI, we are therefore equally "gambling everyone's life" by forcing people to face ordinary mortality.
Selfish motives can be, and frequently are, aligned with the public intere

... (read more)

Ben Millwood🔸

Apr 5 2024

I encourage you not to draw dishonesty inferences from people worried about job losses from AI automation, just because: * it seems like almost no other technologies stood to automate such a broad range of labour essentially simultaneously, * other innovative technologies often did face pushback from people whose jobs were threatened, and generally there have been significant social problems in the past when an economy moves away from people's existing livelihoods (I'm thinking of e.g. coal miners in 1970s / 1980s Britain, though it's not something I know a lot about), * even if the critique doesn't stand up under from-first-principles scrutiny, lots of people think it's a big deal, so if it's a mistake it's surely an understandable one from someone who weighs other opinions (too?) seriously. I think it's reasonable to argue that this worry is wrong, I just think it's a pretty understandable opinion to hold and want to talk about, and I don't feel like it's compelling evidence that someone is deliberately trying to seek out arguments in order to advance a position.

Ryan Greenblatt

Apr 5 2024

See also "The costs of caution" which discuss AI upsides in a relatively thoughtful way.

Matthew_BarnettMay 5 202422

In my latest post I talked about whether unaligned AIs would produce more or less utilitarian value than aligned AIs. To be honest, I'm still quite confused about why many people seem to disagree with the view I expressed, and I'm interested in engaging more to get a better understanding of their perspective.

At the least, I thought I'd write a bit more about my thoughts here, and clarify my own views on the matter, in case anyone is interested in trying to understand my perspective.

The core thesis that was trying to defend is the following view:

My view: It is likely that by default, unaligned AIs—AIs that humans are likely to actually build if we do not completely solve key technical alignment problems—will produce comparable utilitarian value compared to humans, both directly (by being conscious themselves) and indirectly (via their impacts on the world). This is because unaligned AIs will likely both be conscious in a morally relevant sense, and they will likely share human moral concepts, since they will be trained on human data.

Some people seem to merely disagree with my view that unaligned AIs are likely to be conscious in a morally relevant sense. And a few others have a sema... (read more)

Ryan Greenblatt

May 5 2024

My proposed counter-argument loosely based on the structure of yours. Summary of claims * A reasonable fraction of computational resources will be spent based on the result of careful reflection. * I expect to be reasonably aligned with the result of careful reflection from other humans * I expect to be much less aligned with result of AIs-that-seize-control reflecting due to less similarity and the potential for AIs to pursue relatively specific objectives from training (things like reward seeking). * Many arguments that human resource usage won't be that good seem to apply equally well to AIs and thus aren't differential. Full argument The vast majority of value from my perspective on reflection (where my perspective on reflection is probably somewhat utilitarian, but this is somewhat unclear) in the future will come from agents who are trying to optimize explicitly for doing "good" things and are being at least somewhat thoughtful about it, rather than those who incidentally achieve utilitarian objectives. (By "good", I just mean what seems to them to be good.) At present, the moral views of humanity are a hot mess. However, it seems likely to me that a reasonable fraction of the total computational resources of our lightcone (perhaps 50%) will in expectation be spent based on the result of a process in which an agent or some agents think carefully about what would be best in a pretty delibrate and relatively wise way. This could involve eventually deferring to other smarter/wiser agents or massive amounts of self-enhancement. Let's call this a "reasonably-good-reflection" process. Why think a reasonable fraction of resources will be spent like this? * If you self-enhance and get smarter, this sort of reflection on your values seems very natural. The same for deferring to other smarter entities. Further, entities in control might live for an extremely long time, so if they don't lock in something, as long as they eventually get around to being thoug

Ryan Greenblatt

May 5 2024

Suppose that a single misaligned AI takes control and it happens to care somewhat about its own happiness while not having any more "altruistic" tendencies that I would care about or you would care about. (I think misaligned AIs which seize control caring about their own happiness substantially seems less likely than not, but let's suppose this for now.) (I'm saying "single misaligned AI" for simplicity, I get that a messier coalition might be in control.) It now has access to vast amounts of computation after sending out huge numbers of probes to take control over all available energy. This is enough computation to run absolutely absurd amounts of stuff. What are you imagining it spends these resources on which is competitive with optimized goodness? Running >10^50 copies of itself which are heavily optimized for being as happy as possible while spending? If a small number of agents have a vast amount of power, and these agents don't (eventually, possibly after a large amount of thinking) want to do something which is de facto like the values I end up caring about upon reflection (which is probably, though not certainly, vaguely like utilitarianism in some sense), then from my perspective it seems very likely that the resources will be squandered. If you're imagining something like: 1. It thinks carefully about what would make "it" happy. 2. It realizes it cares about having as many diverse good experience moments as possible in a non-indexical way. 3. It realizes that heavy self-modification would result in these experience moments being better and more efficient, so it creates new versions of "itself" which are radically different and produce more efficiently good experiences. 4. It realizes it doesn't care much about the notion of "itself" here and mostly just focuses on good experiences. 5. It runs vast numbers of such copies with diverse experiences. Then this is just something like utilitarianism by another name via a differnet line of reasoning. I

Matthew_BarnettJan 26 202427

I'm considering posting an essay about how I view approaches to mitigate AI risk in the coming weeks. I thought I'd post an outline of that post here first as a way of judging what's currently unclear about my argument, and how it interacts with people's cruxes.

Current outline:

In the coming decades I expect the world will transition from using AIs as tools to relying on AIs to manage and govern the world broadly. This will likely coincide with the deployment of billions of autonomous AI agents, rapid technological progress, widespread automation of labor, and automated decision-making at virtually every level of our society.

Broadly speaking, there are (at least) two main approaches you can take now to try to improve our chances of AI going well:

Try to constrain, delay, or obstruct AI, in order to reduce risk, mitigate negative impacts, or give us more time to solve essential issues. This includes, for example, trying to make sure AIs aren't able to take certain actions (i.e. ensure they are controlled).
Try to set up a good institutional environment, in order to safely and smoothly manage the transition to an AI-dominated world, regardless of when this transition occurs. This mostly

... (read more)

Chris Leong

Jan 27 2024

I'm confused: surely we should want to avoid an AI coup? We may decide to give up control of our future to a singleton, but if we do this, then it should be intentional.

Matthew_Barnett

Jan 27 2024

I agree we should try avoid an AI coup. Perhaps you are falling victim to the following false dichotomy? * We either allow a set of AIs to overthrow our institutions, or * We construct a singleton: a sovereign world government managed by AI that rules over everyone Notably, there is a third option: * We incorporate AIs into our existing social, economic, and legal institutions, flexibly adapting our social structures to cope with technological change without our whole system collapsing

Chris Leong

Jan 27 2024

I wasn't claiming that these were the only two possibilities here (for example, another possibility would be that we never actually build AGI). My suspicion is that a lot of your ideas here sound reasonable on the abstract level, but once you dive into what it actually means on a concrete-level and how these mechanisms will concretely operate, it'll be clear that it's a lot less appealing. Anyway, that's just a gut intuition, obvs. it'll be easier to judge when you publish your write-up.

Roman Leventov

Jan 26 2024

I'm excited to see you posting this. My views are very closely agreed with yours. I summarised my views a few days ago here. One of the most important similarities is that we both emphasise the importance of decision-making and supporting it with institutions. This could be seen as "enactivist" view on agent (human, AI, hybrid, team/organisation) cognition. The biggest difference between our views is that I think the "cognitivist" agenda (i.e., agent internals and algorithms) is as important as the "enactivist" agenda (institutions), whereas you seem to almost disregard the "cognitivist" agenda. I disagree with putting risk-detection/mitigation mechanisms, algorithms, monitorings in that bucket. I think we should just separate between engineering (cf. A plea for solutionism on AI safety) and non-engineering (policy, legislature, treaties, commitments, advocacy) approaches. In particular, the "scheming control" agenda that you link will be concrete engineering practice that should be used in the training of safe AI models in the future, even if we have good institutions, good decision-making algorithms wrapped on top of these AI models, etc. It's not an "alternative path" just for "non-AI-dominated worlds". The same applies ftoor monitoring, interpretability, evals, etc. processes. All of these will require very elaborate engineering on their own. I 100% agree with your reasoning about Frames 1 and 2. I want to discuss the following point in detail because it's a rare view in EA/LW circles: In my post, I also made a similar point: “aligning LLMs with human values” is hardly a part of [the problem of context alignment] at all". But my framing was in general not very clear, so I'd try to improve it and integrate it with your take here: Context alignment is a pervasive process that happens (and sometimes needed) on all timescales: evolutionary, developmental, and online (the examples of the latter in humans: understanding, empathy, rapport). The skill of context a

Nick K.

Jan 26 2024

I like your proposed third frame as a somewhat hopeful vision for the future. Instead of pointing out why you think the other frames are poor, I think it would be helpful to maintain a more neutral approach and elaborate which assumptions each frame makes and give a link to your discussion about these in a sidenote.

Matthew_Barnett

Jan 26 2024

The problem is that I am not trying to portray a "somewhat hopeful vision", but rather present a framework for thinking clearly about AI risks, and how to mitigate them. I think the other frames are not merely too pessimistic: I think they are actually wrong, or at least misleading, in important ways that would predictably lead people to favor bad policy if taken seriously. It's true that I'm likely more optimistic along some axes than most EAs when it comes to AI (although I tend to think I'm less optimistic when it comes to things like whether moral reflection will be a significant force in the future). However, arguing for generic optimism is not my aim. My aim is to improve how people think about future AI.

Nick K.

Jan 26 2024

Noted! The key point I was trying to make is that I'd think it helpful for the discourse to separate 1) how one would act in a frame and 2) why one thinks each one is more or less likely (which is more contentious and easily gets a bit political). Since your post aims at the former, and the latter has been discussed at more length elsewhere, it would make sense to further de-emphasize the latter.

Matthew_Barnett

Jan 26 2024

My post aims at at both. It is a post about how to think about AI, and a large part of that is establishing the "right" framing.

Matthew_BarnettJan 13 202427

(A clearer and more fleshed-out version of this argument is now a top-level post. Read that instead.)

I strongly dislike most AI risk analogies that I see EAs use. While I think analogies can be helpful for explaining a concept to people for the first time, I think they are frequently misused, and often harmful. The fundamental problem is that analogies are consistently mistaken for, and often deliberately intended as arguments for particular AI risk positions. And the majority of the time when analogies are used this way, I think they are misleading and imprecise, routinely conveying the false impression of a specific, credible model of AI, when in fact no such credible model exists.

Here are two particularly egregious examples of analogies I see a lot that I think are misleading in this way:

The analogy that AIs could be like aliens.
The analogy that AIs could treat us just like how humans treat animals.

I think these analogies are typically poor because, when evaluated carefully, they establish almost nothing of importance beyond the logical possibility of severe AI misalignment. Worse, they give the impression of a model for how we should think about AI behavior, even when the speak... (read more)

Steven Byrnes

Jan 13 2024

Yes you have!—including just two paragraphs earlier in that very comment, i.e. you are using the analogy “future AI is very much like today’s LLMs but better”. :) Cf. what I called “left-column thinking” in the diagram here. For all we know, future AIs could be trained in an entirely different way from LLMs, in which case the way that “LLMs are already being trained” would be pretty irrelevant in a discussion of AI risk. That’s actually my own guess, but obviously nobody knows for sure either way. :)

Rafael Harth

Jan 13 2024

I read your first paragraph and was like "disagree", but when I got to the examples, I was like "well of I agree here, but that's only because those analogies are stupid". At least one analogy I'd defend is the Sorcerer's Apprentice one. (Some have argued that the underlying model has aged poorly, but I think that's a red herring since it's not the analogy's fault.) I think it does share important features with the classical x-risk model.

Matthew_BarnettFeb 4*4

[This shortform comment has now been superseded by a slightly longer post.]

Many effective altruists have shown interest in expanding moral consideration to AIs, which I appreciate. However, in my experience, these EAs have primarily focused on AI welfare—mostly by ensuring that AIs are treated well and protected from harm—rather than advocating for AI rights, which has the potential to grant AIs legal autonomy and freedoms. While these two approaches overlap significantly, there is a tendency for these approaches to come apart in the following way:

A welfar

... (read more)

Matthew_BarnettJan 12 202414

I want to challenge an argument that I think is drives a lot of AI risk intuitions. I think the argument goes something like this:

There is something called "human values".
Humans broadly share "human values" with each other.
It would be catastrophic if AIs lacked "human values".
"Human values" are an extremely narrow target, meaning that we need to put in exceptional effort in order to get AIs to be aligned with human values.

My problem with this argument is that "human values" can refer to (at least) three different things, and under every plausible interpretation, the argument appears internally inconsistent.

Broadly speaking, I think "human values" usually refers to one of three concepts:

The individual objectives that people pursue in their own life (i.e. the individual human desire desire for wealth, status, and happiness, usually for themselves or their family and friends)
The set of rules we use to socially coordinate (i.e. our laws, institutions, and social norms)
Our cultural values (i.e. the ways that human societies have broadly differed from each other, in their languages, tastes, styles, etc.)

Under the first interpretation, I think premise (2) of the original argum... (read more)

Karthik Tadepalli

Jan 12 2024

When I think of values I think of interpretation #2, and I don't think you prove that P4 is untrue under that interpretation. The idea is that humans are both a) constrained and b) generally inclined to follow some set of rules. An AI would be neither constrained nor necessarily inclined to follow these rules. Virtually all historical and present atrocities are framed in terms of determining who is a person and who is not. Why would AIs see us as having moral personhood?

Matthew_Barnett

Jan 12 2024

P4 is about whether human values are an extremely narrow target, not about whether AIs will be necessarily be inclined to follow them, or necessarily constrained by them. I agree it is logically possible for AIs to exist who would try to murder humans; indeed, there are already humans who try to do that to others. The primary question is instead about how narrow of a target the value "don't murder" or "don't steal" is, and whether we need to put in exceptional effort in order to hit these targets. Among humans, it seems the specific target here is not very narrow, despite our greatly varying individual objectives. This fact provides a hint at how narrow our basic social mechanisms really are, in my opinion. Here again I would say the question is more about whether thinking that humans have relevant personhood is an extremely narrow target, not about whether AIs will necessarily see us as persons. They may see us as persons, and maybe they won't. But the idea that they would doesn't seem very unnatural. For one, if AIs are created in something like our current legal system, the concept of legal personhood will already be extended to humans by default. It seems pretty natural for future people to inherit legal concepts from the past. And all I'm really arguing here is that this isn't an extremely narrow target to hit, not that it must happen by necessity.

Karthik Tadepalli

Jan 12 2024

I guess "narrow target" is just an underspecified part of your argument then, because I don't know what it's meant to capture if not "in most plausible scenarios, AI doesn't follow the same set of rules as humans".

Matthew_Barnett

Jan 12 2024

Can you outline the case for thinking that "in most plausible scenarios, AI doesn't follow the same set of rules as humans"? To clarify, by "same set of rules" here I'm imagining basic legal rules: do not murder, do not steal etc. I'm not making a claim that specific legal statutes will persist over time. It seems to me both that: * To the extent that AIs are our descendants, they should inherit our legal system, legal principles, and legal concepts, similar to how e.g. the United States inherited legal principles from the United Kingdom. We should certainly expect our legal system to change over time as our institutions adapt to technological change. But, absent a compelling reason otherwise, it seems wrong to think that "do not murder a human" will go out the window in "most plausible scenarios". * Our basic legal rules seem pretty natural, rather than being highly contingent. It's easy to imagine plenty of alien cultures stumbling upon the idea of property rights, and implementing the rule "do not steal from another legal person".

Karthik Tadepalli

Jan 13 2024

My point is that AI could plausibly have rules for interacting with other "persons", and those rules could look much like ours, but that we will not be "persons" under their code. Consider how "do not murder" has never applied to animals. If AIs treat us like we treat animals then the fact that they have "values" will not be very helpful to us.

Matthew_Barnett

Jan 13 2024

I think AIs will be trained on our data, and will be integrated into our culture, having been deliberately designed for the purpose of filling human-shaped holes in our economy, to automate labor. This means they'll probably inherit our social concepts, in addition to most other concepts we have about the physical world. This situation seems disanalogous to the way humans interact with animals in many ways. Animals can't even speak language. Anyway, even the framing you have given seems like a partial concession towards my original point. A rejection of premise 4 is not equivalent to the idea that AIs will automatically follow our legal norms. Instead, it was about whether "human values" are an extremely narrow target, in the sense of being a natural vs. contingent set of values that are very hard to replicate in other circumstances. If the way AIs relate to human values is similar to how humans relate to animals, then I'll point out that many existing humans already find the idea of caring about animals to be quite natural, even if most ultimately decide not to take the idea very far. Compare the concept of "caring about animals" to "caring about paperclip maximization". In the first instance, we have robust examples of people actually doing that, but hardly any examples of people in the second instance. This is after all because caring about paperclip maximization is an unnatural and arbitrary thing to care about relative to how most people conceptualize the world. Again, I'm not saying AIs will necessarily care about human values. That was never the claim. The entire question was about whether human values are an "extremely narrow target". And I think, within this context, given the second interpretation of human values in my original comment, the original thesis seems to have held up fine.

Matthew_BarnettDec 12 202314

Here's a fictional dialogue with a generic EA that I think can perhaps helps explain some of my thoughts about AI risks compared to most EAs:

EA: "Future AIs could be unaligned with human values. If this happened, it would likely be catastrophic. If AIs are unaligned, they'll hide their intentions until they're in a position to strike in a violent coup, and then the world will end (for us at least)."

Me: "I agree that sounds like it would be very bad. But maybe let's examine why this scenario seems plausible to you. What do you mean when you say AIs might be unaligned with human values?"

EA: "Their utility functions would not overlap with our utility functions."

Me: "By that definition, humans are already unaligned with each other. Any given person has almost a completely non-overlapping utility function with a random stranger. People—through their actions rather than words—routinely value their own life and welfare thousands of times higher than that of strangers. Yet even with this misalignment, the world does not end in any strong sense. Nor does this fact automatically imply the world will end for a given group within humanity."

EA: "Sure, but that's because humans mostly all have s... (read more)

Jaime SevillaDec 12 202314

I have so many axes of disagreement that is hard to figure out which one is most relevant. I guess let's go one by one.

Me: "What do you mean when you say AIs might be unaligned with human values?"

I would say that pretty much every agent other than me (and probably me in different times and moods) are "misaligned" with me, in the sense that I would not like a world where they get to dictate everything that happens without consulting me in any way.

This is a quibble because in fact I think if many people were put in such a position they would try asking others what they want and try to make it happen.

Consider a random retirement home. Compared to the rest of the world, it has basically no power. If the rest of humanity decided to destroy or loot the retirement home, there would be virtually no serious opposition.

This hypothetical assumes too much, because people outside care about the lovely people in the retirement home, and they represent their interests. The question is, will some future AIs with relevance and power care for humans, as humans become obsolete?

I think this is relevant, because in the current world there is a lot of variety. There are people who care about ret... (read more)

Habryka [Deactivated]Dec 12 202314

EA: "Their utility functions would not overlap with our utility functions."
Me: "By that definition, humans are already unaligned with each other. Any given person has almost a completely non-overlapping utility function with a random stranger. People—through their actions rather than words—routinely value their own life and welfare thousands of times higher than that of strangers. Yet even with this misalignment, the world does not end in any strong sense."
EA: "Sure, but that's because humans are all roughly the same intelligence and/or capability. Future AIs will be way smarter and more capable than humans."

Just for the record, this is when I got off the train for this dialogue. I don't think humans are misaligned with each other in the relevant ways, and if I could press a button to have the universe be optimized by a random human's coherent extrapolated volition, then that seems great and thousands of times better than what I expect to happen with AI-descendants. I believe this for a mixture of game-theoretic reasons and genuinely thinking that other human's values do really actually capture most of what I care about.

Matthew_Barnett

Dec 12 2023

In this part of the dialogue, when I talk about a utility function of a human, I mean roughly their revealed preferences, rather than their coherent extrapolated volition (which I also think is underspecified). This is important because it is our revealed preferences that better predict our actual behavior, and the point I'm making is simply that behavioral misalignment is common in this sense among humans. And also this fact does not automatically imply the world will end for a given group of humans within humanity.

peterbarnett

Dec 12 2023

This is missing a very important point, which is that I think humans have morally relevant experience and I'm not confident that misaligned AIs would. When the next generation replaces the current one this is somewhat ok because those new humans can experience joy, wonder, adventure etc. My best guess is that AIs that take over and replace humans would not have any morally relevant experience, and basically just leave the universe morally empty. (Note that this might be an ok outcome if by default you expect things to be net negative) I also think that there is way more overlap in the "utility functions" between humans, than between humans and misaligned AIs. Most humans feel empathy and don't want to cause others harm. I think humans would generally accept small costs to improve the lives of others, and a large part of why people don't do this is because people have cognitive biases or aren't thinking clearly. This isn't to say that any random human would reflectively become a perfectly selfless total utilitarian, but rather that most humans do care about the wellbeing of other humans. By default, I don't think misaligned AIs will really care about the wellbeing of humans at all.

Matthew_Barnett

Dec 12 2023

I don't think that's particularly likely, but I can understand if you think this is an important crux. For what it's worth, I don't think it matters as much whether the AIs themselves are sentient, but rather whether they care about sentience. For example, from the perspective of sentience, humans weren't necessarily a great addition to the world, because of their contribution to suffering in animal agriculture (although I'm not giving a confident take here). Even if AIs are not sentient, they'll still be responsible for managing the world, and creating structures in the universe. When this happens, there's a lot of ways for sentience to come about, and I care more about the lower level sentience that the AI manages than the actual AIs at the top who may or may not be sentient.

harfe

Dec 12 2023

I think this is a big moral difference: We do not actively kill the older humans so that we can take over. We care about older people, and societies that are rich enough spend some resources to keep older people alive longer. The entirety of humanity being killed and replaced by the kind of AI that places so little moral value on us humans would be catastrophically bad, compared to things that are currently occurring.

Matthew_BarnettOct 12 202312

I find it slightly strange that EAs aren't emphasizing semiconductor investments more given our views about AI.

(Maybe this is because of a norm against giving investment advice? This would make sense to me, except that there's also a cultural norm about criticizing charities that people donate to, and EAs seemed to blow right through that one.)

I commented on this topic last year. Later, I was informed that some people have been thinking about this and acting on it to some extent, but overall my impression is that there's still a lot of potential value left on the table. I'm really not sure though.

Since I might be wrong and I don't really know what the situation is with EAs and semiconductor investments, I thought I'd just spell out the basic argument, and see what people say:

Credible models of economic growth predict that, if AI can substitute for human labor, then we should expect the year-over-year world economic growth rate to dramatically accelerate, probably to at least 30% and maybe to rates as high as 300% or 3000%.
This rate of growth should be sustainable for a while before crashing, since physical limits appear to permit far more economic value than we're currently g

... (read more)

Erich_Grunewald 🔸Oct 12 202315

I mostly agree with this (and did also buy some semiconductor stock last winter).

Besides plausibly accelerating AI a bit (which I think is a tiny effect at most unless one plans to invest millions), a possible drawback is motivated reasoning (e.g., one may feel less inclined to think critically of the semi industry, and/or less inclined to favor approaches to AI governance that reduce these companies' revenue). This may only matter for people who work in AI governance, and especially compute governance.

Matthew_BarnettJan 28 20249

I'm considering writing a post that critically evaluates the concept of a decisive strategic advantage, i.e. the idea that in the future an AI (or set of AIs) will take over the world in a catastrophic way. I think this concept is central to many arguments about AI risk. I'm eliciting feedback on an outline of this post here in order to determine what's currently unclear or weak about my argument.

The central thesis would be that it is unlikely that an AI, or a unified set of AIs, will violently take over the world in the future, especially at a time when h... (read more)

Chris Leong

Jan 28 2024

Your argument in objection 1 doesn't the position people who are worried about an absurd offense-defense imbalance. Additionally: It may be that no agent can take over the world, but that an agent can destroy the world. Would someone build something like that? Sadly, I think the answer is yes.

Matthew_Barnett

Jan 28 2024

I'm having trouble parsing this sentence. Can you clarify what you meant? What incentive is there to destroy the world, as opposed to take it over? If you destroy the world, aren't you sacrificing yourself at the same time?

Chris Leong

Jan 28 2024

Oh, I can see why it is ambiguous. I meant whether it is easier to attack or defend, which is separate from the "power" attackers have and defenders have. "What incentive is there to destroy the world, as opposed to take it over? If you destroy the world, aren't you sacrificing yourself at the same time?" Some would be willing to do that if they can't take it over.

Matthew_Barnett

Jan 28 2024

What reason is there to think that AI will shift the offense-defense balance absurdly towards offense? I admit such a thing is possible, but it doesn't seem like AI is really the issue here. Can you elaborate?

Ryan Greenblatt

Jan 28 2024

I think main abstract argument for why this is plausible is that AI will change many things very quickly and in a high variance way. And some human processes will lag behind heavily. This could plausibly (though not obviously) lead to offense dominance.

Chris Leong

Jan 28 2024

I'm not going to fully answer this question, b/c I have other work I should be doing, but I'll toss in one argument. If different domains (cyber, bio, manipulation, ect.) have different offense-defense balances a sufficiently smart attacker will pick the domain with the worst balance. This recurses down further for at least some of these domains where they aren't just a single thing, but a broad collection of vaguely related things.

JWS 🔸

Feb 8 2024

I sympathise with/agree with many of your points here (and in general regard AI x-risk), but something about this recent sequence of quick-takes isn't landing with me in the way some of your other work has. I'll try and articulate why in some cases, though I apologies if I misread or misunderstand you. On this post, these two presises/statements raised an eyebrow: To me, this is just as unsupported as people who are incredibly certain that there will be 'treacherous turn'. I get this a supposition/alternative hypothesis, but how can you possible hold a premise that a system of laws will persist indefinitely? This sort of reminds me of the Leahy/Bach discussion where Bach just says 'it's going to align itself with us if it wants to if it likes us if it loves us". I kinda want more that if we're going to build these powerful systems, saying 'trust me bro, it'll follow our laws and norms and love us back" doesn't sound very convincing to me. (For clarity, I don't think this is your position or framing, and I'm not a fan of the classic/Yudkowskian risk position. I want to say I find both perspectives unconvincing) Secondly, people abide by systems of laws and norms, but we also have many cases of where individuals/parties/groups overturned these norms when they had accumulated enough power and didn't feel the need to abide by the existing regime. This doesn't have to look like the traditional DSA model where humanity gets instantly wiped out, but I don't see why there couldn't be a future where an AI makes move like Sulla using force to overthrow and depower the opposing factions, or the 18 Brumaire.

Will Aldred

Jan 28 2024

For what it’s worth, the Metaculus crowd forecast for the question “Will transformative AI result in a singleton (as opposed to a multipolar world)?” is currently “60%”. That is, forecasters believe it’s more likely than not that there won’t be competing AIs with comparable power, which runs counter to your claim. (I bring this up seeing as you make a forecasting-based argument for your claim.)

Matthew_BarnettFeb 24 20245

Some people seem to think the risk from AI comes from AIs gaining dangerous capabilities, like situational awareness. I don't really agree. I view the main risk as simply arising from the fact that AIs will be increasingly integrated into our world, diminishing human control.

Under my view, the most important thing is whether AIs will be capable of automating economically valuable tasks, since this will prompt people to adopt AIs widely to automate labor. If AIs have situational awareness, but aren't economically important, that's not as concerning.

The risk... (read more)

Matthew_BarnettMar 2 20209

I hold a few core ethical ideas that are extremely unpopular: the idea that we should treat the natural suffering of animals as a grave moral catastrophe, the idea that old age and involuntary death is the number one enemy of humanity, the idea that we should treat so-called farm animals with an very high level of compassion.

Given the unpopularity of these ideas, you might be tempted to think that the reason they are unpopular is that they are exceptionally counterinuitive ones. But is that the case? Do you really need a modern education and philosphical t... (read more)

Matthew_BarnettMar 13 20205

I have now posted as a comment on Lesswrong my summary of some recent economic forecasts and whether they are underestimating the impact of the coronavirus. You can help me by critiquing my analysis.

Matthew_BarnettMay 1 20203

A trip to Mars that brought back human passengers also has the chance of bringing back microbial Martian passengers. This could be an existential risk if microbes from Mars harm our biosphere in a severe and irreparable manner.

From Carl Sagan in 1973, "Precisely because Mars is an environment of great potential biological interest, it is possible that on Mars there are pathogens, organisms which, if transported to the terrestrial environment, might do enormous biological damage - a Martian plague, the twist in the plot of H. G. Wells' War of the ... (read more)

Matthew_BarnettJan 29 20240

In response to human labor being automated, a lot of people support a UBI funded by a tax on capital. I don't think this policy is necessarily unreasonable, but if later the UBI gets extended to AIs, this would be pretty bad for humans, whose only real assets will be capital.

As a result, the unintended consequence of such a policy may be to set a precedent for a massive wealth transfer from humans to AIs. This could be good if you are utilitarian and think the marginal utility of wealth is higher for AIs than humans. But selfishly, it's a big cost.