The standard case for delaying AI appears to rest on non-utilitarian assumptions

Matthew_Barnett

Among effective altruists, it is sometimes claimed that delaying AI development for safety reasons is ethically justified based on straightforward utilitarian logic—particularly the idea that reducing existential risk has overwhelming moral value. However, I believe this claim is mistaken. While the primary argument for delaying AI may appear utilitarian on the surface, I think it actually depends on deeper ethical assumptions that are not strictly utilitarian in nature.

To be clear, I am not arguing that one cannot, in theory, construct a logically consistent utilitarian argument for delaying AI. One could, for instance, argue that the AIs we create won't be conscious in a way that has moral value, or that misaligned AI will lead to immense suffering—making it worthwhile to delay AI if doing so would genuinely act to mitigate these specific outcomes. My claim is not that such an argument would be logically incoherent. Rather, my claim is that the standard case for delaying AI—the argument most commonly made in effective altruist discussions—seems to not actually rely on these premises. Instead, it appears to rest on an implicit speciesist assumption that prioritizes the survival of the human species itself, rather than purely impartial utilitarian concerns for maximizing well-being or preventing suffering.

In this post, I try to demonstrate this claim. First, I outline what I see as the "standard case" for delaying AI from an EA longtermist perspective. I then argue that, despite the common perception that this case follows straightforward utilitarian reasoning, it actually seems primarily driven by a preference for preserving the human species as a category—even when this conflicts with traditional utilitarian objectives like maximizing the well-being of all sentient beings, including future AI entities.

The "standard case" for delaying AI

While there is considerable debate on this topic, here I will outline what I perceive as the "standard case" for delaying AI development for safety reasons. This is based on numerous discussions I have had with EAs about this topic over the last few years, as well as my personal review of many articles and social media posts advocating for pausing or delaying AI.

However, I want to emphasize that this is not the only reasoning used to justify delaying AI—there is indeed significant variation in how different EAs approach this issue. In other words, I am not claiming that this argument fully captures the views of all, or even most, EAs who have thought about this subject. Nonetheless, I believe the following argument is still broadly representative of a common line of reasoning:

Step 1: The Astronomical Waste Argument

This step in the argument claims that reducing existential risk—even by a tiny amount—is overwhelmingly more valuable than accelerating technological progress. The reasoning is that an existential catastrophe would eliminate nearly all future value, whereas hastening technological advancement (e.g., space colonization, AGI, etc.) would only move technological maturity forward by a short period of time. Given this, even a minor reduction in existential risk is argued to be vastly more important than accelerating progress toward a utopian future.

Step 2: AI is an existential risk that can be mitigated by delaying AGI

This step in the argument claims that slowing down AI development gives us more time to conduct safety research, which in turn reduces the risk that future AGIs will have values misaligned with human interests. By delaying AGI, we increase our ability to develop adequate regulatory safeguards and technical alignment techniques, thereby lowering the probability of an AI-driven existential catastrophe, whereby the human species either goes extinct or is radically disempowered.

Conclusion: The moral case for delaying AGI

Based on these reasoning steps, the conclusion is that delaying AGI is morally justified because it meaningfully reduces existential risk, and the value of this risk reduction vastly outweighs any negative consequences for currently existing humans. While delaying AGI may postpone medical breakthroughs and other technological advancements—thereby shortening the lifespans of people alive today, and forcing them to endure avoidable suffering for longer—this cost is seen as negligible in comparison to the overwhelming moral importance of preventing an AI-induced existential catastrophe that could wipe out all future generations of humans.

Why the standard case for delaying AI seems to rest on non-utilitarian assumptions

It may be tempting to believe that the argument I have just outlined closely parallels the argument for prioritizing other existential risks—such as the risk of a giant asteroid impact. However, these arguments are actually quite distinct.

To illustrate, consider the hypothetical scenario of a massive asteroid on a direct collision course with Earth. If this asteroid were to strike, it would not only wipe out all currently existing human life but also eliminate the possibility of any future civilization emerging. This means that all potential future generations—who could have gone on to colonize the universe and create an astronomically large amount of moral value—would never come into existence. According to the astronomical waste argument, preventing this catastrophe would be of overwhelming moral importance because the value of ensuring that future civilizations do emerge vastly outweighs the relatively minor concern of whether civilization emerges slightly earlier or later.

At first glance, proponents of delaying AI might want to claim that their argument follows the same logic. However, this would be misleading. The key difference is that in most existential risk scenarios, civilization itself would be completely destroyed, whereas in the case of AI risk, civilization would continue to exist—just under AI control rather than human control.

In other words, even if AIs were to drive humans to extinction or permanently disempower humanity, this would not necessarily mean that all future moral value is lost. AIs could still go on to build an intergalactic civilization, ensuring that complex life continues to exist, expand, and potentially flourish across the universe—just without humans. This means that the long-term future would still be realized, but in a form that does not involve human beings playing the central role.

This distinction is crucial because it directly undermines the application of the astronomical waste argument to AI existential risk. The astronomical waste argument applies most straightforwardly to scenarios where all future potential value is permanently destroyed—such as if a catastrophic asteroid impact wiped out all complex life on Earth, preventing any future civilization from emerging. But if AIs take over and continue building an advanced civilization, then the universe would still be filled with intelligent beings capable of creating vast amounts of moral value. The primary difference is that these beings would be AIs rather than biological humans.

This matters because, from a longtermist utilitarian perspective, the fundamental goal is to maximize total utility over the long term, without privileging any specific group based purely on arbitrary characteristics like species or physical substrate. A consistent longtermist utilitarian should therefore, in principle, give moral weight to all sentient beings, whether they are human or artificial. If one truly adheres to this impartial framework, then they would have no inherent preference for a future dominated by biological humans over one dominated by highly intelligent AIs.

Of course, one can still think—as I do—that human extinction would be a terrible outcome for the people who are alive when it occurs. Even if the AIs that replace us are just as morally valuable as we are from an impartial moral perspective, it would still be a moral disaster for all currently existing humans to die. However, if we accept this perspective, then we must also acknowledge that, from the standpoint of people living today, there appear to be compelling reasons to accelerate AI development rather than delay it for safety reasons.

The reasoning is straightforward: if AI becomes advanced enough to pose an existential threat to humanity, then it would almost certainly also be powerful enough to enable massive technological progress—potentially revolutionizing medicine, biotechnology, and other fields in ways that could drastically improve and extend human lives. For example, advanced AI could help develop cures for aging, eliminate extreme suffering, and significantly enhance human health through medical and biological interventions. These advancements could allow many people who are alive today to live much longer, healthier, and more fulfilling lives.

As economist Chad Jones has pointed out, delaying AI development means that the current generation of humans risks missing out on these transformative benefits. If AI is delayed for years or decades, a large fraction of people alive today—including those advocating for AI safety—would not live long enough to experience these life-extending technologies. This leads to a strong argument for accelerating AI, at least from the perspective of present-day individuals, unless one is either unusually risk-averse, or they have a very high confidence (such as above 50%) that AI will lead to human extinction.

To be clear, if someone genuinely believes there is a high probability that AI will wipe out humanity, then I agree that delaying AI would seem rational, since the high risk of personal death would outweigh the small possibility of a dramatically improved life. But for those who see AI extinction risk as relatively low (such as below 15%), accelerating AI development appears to be the more pragmatic personal choice.

Thus, while human extinction would undoubtedly be a disastrous event, the idea that even a small risk of extinction from AI justifies delaying its development—even if that delay results in large numbers of currently existing humans dying from preventable causes—is not supported by straightforward utilitarian reasoning. The key question here is what extinction actually entails. If human extinction means the total disappearance of all complex life and the permanent loss of all future value, then mitigating even a small risk of such an event might seem overwhelmingly important. However, if the outcome of human extinction is simply that AIs replace humans—while still continuing civilization and potentially generating vast amounts of moral value—then the reasoning behind delaying AI development changes fundamentally.

In this case, the clearest and most direct tradeoff is not about preventing "astronomical waste" in the classic sense (i.e., preserving the potential for future civilizations) but rather about whether the risk of AI takeover is acceptable to the current generation of humans. In other words, is it justifiable to impose costs on presently living people—including delaying potentially life-saving medical advancements—just to reduce a relatively small probability that humanity might be forcibly replaced by AI? This question is distinct from the broader existential risk arguments that typically focus on preserving all future potential value, and it suggests that delaying AI is not obviously justified by utilitarian logic alone.

From a historical perspective, existential transitions—where one form of life is replaced by another—are not uncommon. Mass extinctions have occurred repeatedly throughout Earth's history, yet they have not resulted in the total elimination of all complex life or all utilitarian moral value. If AIs were to replace humans, it would be a transition of similar nature, not necessarily a total moral catastrophe in the way that true extinction of all complex life would be.

Another natural process that mirrors the pattern of one form of life being replaced by another is ordinary generational replacement. By this, I am referring to the fact that, as time passes, each generation of humans inevitably ages and dies, and a new generation is born to take its place. While this cycle preserves the human species as a whole, it still follows the fundamental pattern of one group of individuals—who once fully controlled the world—entirely disappearing and being replaced by another group that did not previously exist.

Once we recognize these parallels, it becomes clearer that AI existential risk is functionally more similar to a generational transition between different forms of intelligent life than it is to the total extinction of all complex life. The key difference is that, instead of new biological humans replacing old biological humans, future AI entities would replace humans altogether. But functionally, both processes involve one intelligent group dying out and another taking over, continuing civilization in a new form.

This realization highlights a fundamental assumption underlying the "standard case" for delaying AI: it is not primarily based on a concern for the survival of individual human beings, or the continuity of civilization, but rather on a speciesist preference for the survival of the human species as a category.

The assumption is that the death of humanity is uniquely catastrophic not because intelligent life or civilization would end, but because the human species itself would no longer exist. Here, humanity is not being valued merely as a collection of currently living individuals but as an abstract genetic category—one that is preserved across generations through biological reproduction. The implicit belief appears to be that even though both humans and AIs would be capable of complex thought and moral reasoning, only humans belong to the privileged genetic category of "humanity", which is assumed to have special moral significance.

This speciesist assumption suggests that the true moral concern driving the argument for delaying AI is not the loss of future moral value in general, but rather the loss of specifically human control over that value. If AI were to replace humans, civilization would not disappear—only the genetic lineage of Homo sapiens would. The claim that this constitutes an "existential catastrophe" is therefore not based on the objective loss of complex life that could create moral value, but on the belief that only human life (or biological life), as opposed to artificial life, is truly valuable.

As a result, the standard argument for delaying AI fundamentally relies on prioritizing the survival of the human species as a category, rather than simply the survival of sentient beings capable of experiencing value, or improving the lives of people who currently exist. This assumption is rarely made explicit, but once recognized, it undermines the idea that AI-driven human extinction is straightforwardly comparable to an asteroid wiping out all life. Instead, it becomes clear that the argument is rooted in a preference for human biological continuity—one that is far more species-centric than purely utilitarian in nature.

Can a utilitarian case be made for delaying AI?

So far I have written about the "standard case" for delaying AI development, as I see it. However, to be clear, I am not denying that one could construct a purely utilitarian argument for why AIs might generate less moral value than humans, and thus why delaying AI could be justified. My main point, however, is that evidence supporting such an argument is rarely made explicit or provided in discussions on this topic.

For instance, one common claim is that the key difference between humans and AIs is consciousness—that is, humans are known to be conscious, while AIs may not be. Because moral value is often linked to consciousness, this argument suggests that ensuring the survival of humans (rather than being replaced by AIs) is crucial for preserving moral value.

While I acknowledge that this is a major argument people often invoke in personal discussions, it does not appear to be strongly supported within effective altruist literature. In fact, I have come across very few articles on the EA Forum or in EA literature that explicitly argue that AIs will not be conscious and then connect this point to the urgency of delaying AI, or reducing AI existential risk. Indeed, I suspect there are many more articles from EAs that argue what is functionally the opposite claim—namely, that AIs will probably be conscious. This is likely due to the popularity of functionalist theories of consciousness among many effective altruists, which suggest that consciousness is determined by computational properties rather than biological substrate. If one accepts this view, then there are few inherent reasons to assume that future AIs would lack consciousness or moral worth.

Another potential argument is that humans are more likely than AIs to pursue goals aligned with utilitarian values, which would make preserving human civilization morally preferable. While this argument is logically coherent, it does not seem to have strong, explicit support in the EA literature—at least, to my knowledge. I have encountered few, if any, rigorous EA analyses that explicitly argue future AIs will likely be less aligned with utilitarian values than humans. Without such an argument, this claim remains little more than an assertion. And if one can simply assert this claim without strong evidence, then one could just as easily assert the opposite—that AIs, on average, might actually be more aligned with utilitarian values than humans—leading to the opposite conclusion.

Either way, such an argument would depend on empirical evidence about the likely distribution of AI goals in the future. In other words, to claim that AIs are less likely than humans to adopt utilitarian values, one would need to provide concrete evidence about what kinds of objectives advanced AIs are actually expected to develop. However, discussions on this topic rarely present detailed empirical analyses of what this distribution of AI goals is likely to look like, making this claim very speculative, and so far, largely unjustified.

Thus, my argument is not that it is logically impossible to construct a utilitarian case for delaying AI in the name of safety. I fully acknowledge that such an argument could be made. However, based on the literature that currently exists supporting the idea of delaying AI development, I suspect that the most common real-world justification that people rely on for this position is not a carefully constructed utilitarian argument. Instead, it appears to rest largely on an implicit speciesist preference for preserving the human species—an assumption that is disconnected from traditional utilitarian principles, which prioritize maximizing well-being for actual individuals, rather than preserving a particular species for its own sake.

15 Reactions

More posts like this

Comments55

Sorted by

New & upvoted

Click to highlight new comments since: Today at 2:26 PM

Ben Millwood🔸Feb 1114

This doesn't seem right to me because I think it's popular among those concerned with the longer term future to expect it to be populated with emulated humans, which clearly isn't a continuation of the genetic legacy of humans, so I feel pretty confident that it's something else about humanity that people want to preserve against AI. (I'm not here to defend this particular vision of the future beyond noting that people like Holden Karnofsky have written about it, so it's not exactly niche.)

You say that expecting AI to have worse goals than humans would require studying things like what the empirical observed goals of AI systems turn out to be, and similar – sure, so in the absence of having done those studies, we should delay our replacement until they can be done. And doing these studies is undermined by the fact that right now the state of our knowledge on how to reliably determine what an AI is thinking is pretty bad, and it will only get worse as they develop their abilities to strategise and lie. Solving these problems would be a major piece of what people are looking for in alignment research, and precisely the kind of thing it seems worth delaying AI progress for.

Matthew_BarnettFeb 11*4

This doesn't seem right to me because I think it's popular among those concerned with the longer term future to expect it to be populated with emulated humans, which clearly isn't a continuation of the genetic legacy of humans, so I feel pretty confident that it's something else about humanity that people want to preserve against AI.

Your point that people may not necessarily care about humanity’s genetic legacy in itself is reasonable. However, if people value simulated humans but not generic AIs, the key distinction they are making still seems to be based on species identity rather than on a principle that a utilitarian, looking at things impartially, would recognize as morally significant.

In this context, “species” wouldn’t be defined strictly in terms of genetic inheritance. Instead, it would encompass a slightly broader concept—one that includes both genetic heritage and the faithful functional replication of biologically evolved beings within a digital medium. Nonetheless, the core element of my thesis remains intact: this preference appears rooted in non-utilitarian considerations.

You say that expecting AI to have worse goals than humans would require studying things like what the empirical observed goals of AI systems turn out to be, and similar – sure, so in the absence of having done those studies, we should delay our replacement until they can be done.

Right now, we lack significant empirical evidence to determine whether AI civilization will ultimately generate more or less valuable than human civilization from a utilitarian point of view. Since we cannot say which is the case, there is no clear reason to default to delaying AI development over accelerating it. If AIs turn out to be generate more moral value, then delaying AI would mean we are actively making a mistake—we would be pushing the future toward a suboptimal state from a utilitarian perspective, by entrenching the human species.

This is because, by assumption, the main effect from delaying AI is to increase the probability that AIs will be aligned with human interests, which is not equivalent to maximizing utilitarian moral value. Conversely, if AIs end up generating less moral value, as many effective altruists currently believe, then delaying AI would indeed be the right call. But since we don’t know which scenario is true, we should acknowledge our uncertainty rather than assume that delaying AI is the obvious default course of action.

Given this uncertainty, the rational approach is to suspend judgment rather than confidently assert that slowing down AI is beneficial. Yet I perceive many EAs as taking the confident approach—acting as if delaying AI is clearly the right decision from a longtermist utilitarian perspective, despite the lack of solid evidence.

Additionally, delaying AI would likely impose significant costs on currently existing humans by delaying technological development, which in my view shifts the default consideration in the opposite direction from what you suggest. This becomes especially relevant for those who do not adhere strictly to total utilitarian longtermism but instead care, at least to some degree, about the well-being of people alive today.

Neel NandaFeb 1113

I think you're using the world utilitarian in a very non standard way here. "AI civilization has comparable moral value to human civilization" is a very strong claim that you don't provide evidence for. You can't just call this speciesism and shift the burden of proof! At the very least, we should have wide error bars over the ratio of moral value between AIs and humans, and I would argue also whether AIs have moral value at all.

I personally am happy to bite the bullet and say that I morally value human civilization continuing over an AI civilization that killed all of humanity, and that this is a significant term in my utility function.

Matthew_BarnettFeb 118

In the absence of meaningful evidence about the nature of AI civilization, what justification is there for assuming that it will have less moral value than human civilization—other than a speciesist bias? While I agree that there is great uncertainty, your argument appears to be entirely symmetric. AI civilization could turn out to be far more morally valuable than human civilization, or it could be far less valuable, from a utilitarian perspective. Both possibilities seem plausible, since we have little information either way. Given this vast uncertainty, there is no clear reason to default to one assumption over the other. In such a situation, the best response is not to commit to a particular stance as the default, but rather to suspend judgment until stronger evidence emerges.

I personally am happy to bite the bullet and say that I morally value human civilization continuing over an AI civilization that killed all of humanity, and that this is a significant term in my utility function.

I'm glad to see you explicitly acknowledge that you accept the implications regarding the value of human civilization. However, your statement here is a bit ambiguous—when considering what to prioritize, do you place greater value on the survival of the human species as a whole or on the well-being and preservation of the humans who are currently alive? Personally, my intuitions lean more strongly toward prioritizing the latter.

Habryka [Deactivated]Feb 135

In the absence of meaningful evidence about the nature of AI civilization, what justification is there for assuming that it will have less moral value than human civilization—other than a speciesist bias?

You know these arguments! You have heard them hundreds of times. Humans care about many things. Sometimes we collapse that into caring about experience for simplicity.

AIs will probably not care about the same things, as such, the universe will be worse by our lights if controlled by AI civilizations. We don't know what exactly those things are, but the only pointer to our values that we have is ourselves, and AIs will not share those pointers.

Matthew_BarnettFeb 139

I think your response largely assumes a human-species-centered viewpoint, rather than engaging with my critique that is precisely aimed at re-evaluating this very point of view.

You say, “AIs will probably not care about the same things, so the universe will be worse by our lights if controlled by AI.” But what are "our lights" and "our values" in this context? Are you referring to the values of me as an individual, the current generation of humans, or humanity as a broad, ongoing species-category? These are distinct—and often conflicting—sets of values, preferences, and priorities. It’s possible, indeed probable, that I, personally, have preferences that differ fundamentally from the majority of humans. "My values" are not the same as "our values".

When you talk about whether an AI civilization is “better” or “worse,” it’s crucial to clarify what perspective we’re measuring that from. If, from the outset, we assume that human values, or the survival of humanity-as-a-species, is the critical factor that determines whether an AI civilization is better or worse than our own, that effectively begs the question. It merely assumes what I aim to challenge. From a more impartial standpoint, the mere fact that AI might not care about the exact same things humans do doesn’t necessarily entail a decrease in total impartial moral value—unless we’ve already decided in advance that human values are inherently more important.

(To make this point clearer, perhaps replace all mentions of "human values" with "North American values" in the standard arguments about these issues, and see if it makes these arguments sound like they privilege an arbitrary category of beings.)

While it’s valid to personally value the continuation of the human species, or the preservation of human values, as a moral preference above other priorities, my point is simply that that’s precisely the species-centric assumption I’m highlighting, rather than a distinct argument that undermines my observations or analysis. Such a perspective is not substrate or species-neutral. Nor is it obviously mandated by a strictly utilitarian framework; it’s an extra premise that privileges the category "humankind" for its own sake. You may believe that such a preference is natural or good from your own perspective, but that is not equivalent to saying that it is the preference of an impartial utilitarian, who would, in theory, make no inherent distinction based purely on species, or substrate.

Neel NandaFeb 1311

Are you assuming some kind of moral realism here? That there's some deep moral truth, humans may or may not have insight into it, so any other intelligent entity is equally likely to?

If so, idk, I just reject your premise. I value what I chose to value, which is obviously related to human values, and an arbitrary sampled entity is not likely to be better on that front

Habryka [Deactivated]Feb 134

Yeah, this.

From my perspective "caring about anything but human values" doesn't make any sense. Of course, even more specifically, "caring about anything but my own values" also doesn't make sense, but in as much as you are talking to humans, and making arguments about what other humans should do, you have to ground that in their values and so it makes sense to talk about "human values".

The AIs will not share the pointer to these values, in the same way as every individual does to their own values, and so we should a-priori assume the AI will do worse things after we transfer all the power from the humans to the AIs.

Matthew_BarnettFeb 132

Let's define "shumanity" as the set of all humans who are currently alive. Under this definition, every living person today is a "shuman," but our future children may not be, since they do not yet exist. Now, let's define "humanity" as the set of all humans who could ever exist, including future generations. Under this broader definition, both we and our future children are part of humanity.

If all currently living humans (shumanity) were to die, this would be a catastrophic loss from the perspective of shuman values—the values held by the people who are alive today. However, it would not necessarily be a catastrophic loss from the perspective of human values—the values of humanity as a whole, across time. This distinction is crucial. In the normal course of events, every generation eventually grows old, dies, and is replaced by the next. When this happens, shumanity, as defined, ceases to exist, and as such, shuman values are lost. However, humanity continues, carried forward by the new generation. Thus, human values are preserved, but not shuman values.

Now, consider this in the context of AI. Would the extinction of shumanity by AIs be much worse than the natural generational cycle of human replacement? In my view, it is not obvious that being replaced by AIs would be much worse than being replaced by future generations of humans. Both scenarios involve the complete loss of the individual values held by currently living people, which is undeniably a major loss. To be very clear, I am not saying that it would be fine if everyone died. But in both cases, something new takes our place, continuing some form of value, mitigating part of the loss. This is the same perspective I apply to AI: its rise might not necessarily be far worse than the inevitable generational turnover of humans, which equally involves everyone dying (which I see as a bad thing!). Maybe "human values" would die in this scenario, but this would not necessarily entail the end of the broader concept of impartial utilitarian value. This is precisely my point.

Habryka [Deactivated]Feb 136

Now, consider this in the context of AI. Would the extinction of shumanity by AIs be much worse than the natural generational cycle of human replacement?

I think the answer to this is "yes", because your shared genetics and culture create much more robust pointers to your values than we are likely to get with AI.

Additionally, even if that wasn't true, humans alive at present have obligations inherited from the past and relatedly obligations to the future. We have contracts and inheritance principles and various things that extend our moral circle of concern beyond just the current generation. It is not sufficient to coordinate with just the present humans, we are engaging in at least some moral trade with future generations, and trading away their influence to AI systems is also not something we have the right to do.

(Importantly, I think we have many fewer such obligations to very distant generations, since I don't think we are generally borrowing or coordinating with humans living in the far future very much).

From a more impartial standpoint, the mere fact that AI might not care about the exact same things humans do doesn’t necessarily entail a decrease in total impartial moral value—unless we’ve already decided in advance that human values are inherently more important.

Look, this sentence just really doesn't make any sense to me. From the perspective of humanity, which is composed of many humans, of course the fact that AI does not care about the same things as humans creates a strong presumption that a world optimized for those values will be worse than a world optimized for human values. Yes, current humans are also limited to what degree we successfully can delegate the fulfillment of our values to future generations, but we also just share, on-average, a huge fraction of our values with future generations. That is a struggle every generation faces, and you are just advocating for... total defeat being fine for some reason? Yes, it would be terrible if the next generation of humans suddenly did not care about almost anything I cared about, but that is very unlikely to happen, but it is quite likely to happen with AI systems.

Neel NandaFeb 145

Because there is a much higher correlation between the value of the current generation of humans and the next one than there is between the values of humans and arbitrary AI entities

Matthew_BarnettFeb 152

I'm not talking about "arbitrary AI entities" in this context, but instead, the AI entities who will actually exist in the future, who will presumably be shaped by our training data, as well as our training methods. From this perspective, it's not clear to me that your claim is true. But even if your claim is true, I was actually making a different point. My point was instead that it isn't clear that future generations of AIs would be much worse than future generations of humans from an impartial utilitarian point of view.

(That said, it sounds like the real crux between us might instead be about whether pausing AI would be very costly to people who currently exist. If indeed you disagree with me about this point, I'd prefer you reply to my other comment rather than replying to this one, as I perceive that discussion as likely to be more productive.)

Matthew_BarnettFeb 134

I don’t subscribe to moral realism. My own ethical outlook is a blend of personal attachments—my own life, my family, my friends, and other living humans—as well as a broader utilitarian concern for overall well-being. In this post, I focused on impartial utilitarianism because that’s the framework most often used by effective altruists.

However, to the extent that I also have non-utilitarian concerns (like caring about specific people I know), those concerns incline me away from supporting a pause on AI. If AI can accelerate technologies that save and improve the lives of people who exist right now, then slowing it down would cost lives in the near term. A more complete, and more rigorous version of this argument was outlined in the post.

What I find confusing about other EA's views, including yours, is why we would assign such great importance to “human values” as something specifically tied to the human species as an abstract concept, rather than merely being partial to actual individuals who exist. This perspective is neither utilitarian, nor is it individualistic. It seems to value the concept of the human species over and above the actual individuals that comprise the species, much like how an ideological nationalist might view the survival of their nation as more important than the welfare of all the individuals who actually reside within the nation.

Neel NandaFeb 146

For your broader point of impartiality, I feel like you are continuing to assume some bizarre form of moral realism and I don't understand the case. Otherwise, why do you not consider rocks to be morally meaningful? Why is a plant not valuable? I can come up with reasons, but these are assuming specific things about what is and is not morally valuable in exactly the same way that when I say arbitrary AI beings are on average substantially less valuable because I have specific preferences and values over what matters. I do not understand the philosophical position you are taking here - it feels like you're saying that the standard position is speciesist and arbitrary and then drawing an arbitrary distinction slightly further out?

Matthew_BarnettFeb 142

For your broader point of impartiality, I feel like you are continuing to assume some bizarre form of moral realism and I don't understand the case. Otherwise, why do you not consider rocks to be morally meaningful? Why is a plant not valuable?

Traditionally, utilitarianism regards these things (rocks and plants) as lacking moral value because they do not have well-being or preferences. This principle does not clearly apply to AI, though it's possible that you are making the assumption that future AIs will lack sentience or meaningful preferences. It would be helpful if you clarified how you perceive me to be assuming a form of moral realism (a meta-ethical theory), as I simply view myself as applying a standard utilitarian framework (a normative theory).

I do not understand the philosophical position you are taking here - it feels like you're saying that the standard position is speciesist and arbitrary and then drawing an arbitrary distinction slightly further out?

Standard utilitarianism recognizes both morally relevant and morally irrelevant distinctions in value. According to a long tradition, following Jeremy Bentham and Peter Singer, among others, the species category is considered morally irrelevant, whereas sentience and/or preferences are considered morally relevant. I do not think this philosophy rests on the premise of moral realism: rather, it's a conceptual framework for understanding morality, whether from a moral realist or anti-realist point of view.

To be clear, I agree that utilitarianism is itself arbitrary, from a sufficiently neutral point of view. But it's also a fairly standard ethical framework, not just in EA but in academic philosophy too. I don't think I'm making very unusual assumptions here.

Neel NandaFeb 172

Ah! Thanks for clarifying - if I understand correctly, you think that it's reasonable to assert that sentience and preferences are what makes an entity morally meaningful, but that anything more specific is not? I personally just disagree with that premise, but I can see where you're coming from

But in that case, it's highly non obvious to me that AIs will have sentience or preferences in ways that I consider meaningful - this seems like an open philosophical question. Defining actually what they are also seems like an open question to me - does a thermostat have preferences? Does a plant that grows towards the light? While I do feel fairly confident humans are morally meaningful. Is your argument that even if there's a good chance they're not morally meaningful, the expected amount of moral significance is comparable to humans?

Matthew_BarnettFeb 172

Thanks for clarifying - if I understand correctly, you think that it's reasonable to assert that sentience and preferences are what makes an entity morally meaningful, but that anything more specific is not?

I don't think there's any moral view that's objectively more "reasonable" than any other moral view (as I'm a moral anti-realist). However, I personally don't have a significant moral preference for humans beyond the fact that I am partial to my family, friends, and a lot of other people who are currently alive. When I think about potential future generations who don't exist yet, I tend to adopt a more impartial, utilitarian framework.

In other words, my moral views can be summarized as a combination of personal attachments and broader utilitarian moral concerns. My personal attachments are not impartial: for example, I care about my family more than I care about random strangers. However, beyond my personal attachments, I tend to take an impartial utilitarian approach that doesn't assign any special value to the human species.

In other words, to the extent I care about humans specifically, this concern merely arises from the fact that I'm attached to some currently living individuals who happen to be human—rather than because I think the human species is particularly important.

Does that make sense?

But in that case, it's highly non obvious to me that AIs will have sentience or preferences in ways that I consider meaningful - this seems like an open philosophical question. Defining actually what they are also seems like an open question to me - does a thermostat have preferences?

I agree this is an open question, but I think it's much clearer that future AIs will have complex and meaningful preferences compared to a thermostat or a plant. I think we can actually be pretty confident about this prediction given the strong economic pressures that will push AIs towards being person-like and agentic. (Note, however, that I'm not making a strong claim here that all AIs will be moral patients in the future. It's sufficient for my argument if merely a large number of them are.)

In fact, a lot of arguments for AI risk rest on the premise that AI agents will exist in the future, and that they'll have certain preferences (at least in a functional sense). If we were to learn that future AIs won't have preferences, that would both undermine these arguments for AI risk, and many of my moral arguments for valuing AIs. Therefore, to the extent you think AIs will lack the cognitive prerequisites for moral patienthood—under my functionalist and preference utilitarian views—this doesn't necessarily translate into a stronger case for worrying about AI takeover.

However, I want to note that the view I have just described is actually broader than the thesis I gave in the post. If you read my post carefully, you'll see that I actually hedged quite a bit by saying that there are potential, logically consistent utilitarian arguments that could be made in favor of pausing AI. My thesis in the post was not that such an argument couldn't be given. It was actually a fairly narrow thesis, and I didn't make a strong claim that AI-controlled futures would create about as much utilitarian moral value as human-controlled futures in expectation (even though I personally think this claim is plausible).

Neel NandaFeb 172

I think that even the association between functional agency and preferences in a morally valuable sense is an open philosophical question that I am not happy taking as a given.

Regardless, it seems like our underlying crux is that we assign utility to different things. I somewhat object to you saying that your version of this is utilitarianism and notions of assigning utility that privilege things humans value are not

Matthew_BarnettFeb 172

Regardless, it seems like our underlying crux is that we assign utility to different things. I somewhat object to you saying that your version of this is utilitarianism and notions of assigning utility that privilege things humans value are not

I agree that our main point of disagreement seems to be about what we ultimately care about.

For what it's worth, I didn’t mean to suggest in my post that my moral perspective is inherently superior to others. For example, my argument is fully compatible with someone being a deontologist. My goal was simply to articulate what I saw standard impartial utilitarianism as saying in this context, and to point out how many people's arguments for AI pause don't seem to track what standard impartial utilitarianism actually says. However, this only matters insofar as one adheres to that specific moral framework.

As a matter of terminology, I do think that the way I'm using the words "impartial utilitarianism" aligns more strongly with common usage in academic philosophy, given the emphasis that many utilitarians have placed on antispeciesist principles. However, even if you think I'm wrong on the grounds of terminology, I don't think this disagreement subtracts much from the substance of my post as I'm simply talking about the implications of a common moral theory (regardless of whatever we choose to call it).

Neel NandaFeb 172

Thanks for clarifying. In that case I think that we broadly agree

Neel NandaFeb 144

If AI can accelerate technologies that save and improve the lives of people who exist right now, then slowing it down would cost lives in the near term.

Huh? This argument only goes through if you have a sufficiently low probability of existential risk or an extremely low change in your probability of existential risk, conditioned on things moving slower. I disagree with both of these assumptions. Which part of your post are you referring to?

Matthew_BarnettFeb 15*14

Huh? This argument only goes through if you have a sufficiently low probability of existential risk or an extremely low change in your probability of existential risk, conditioned on things moving slower.

This claim seems false, though its truth hinges on what exactly you mean by a "sufficiently low probability of existential risk" and "an extremely low change in your probability of existential risk".

To illustrate why I think your claim is false, I'll perform a quick calculation. I don't know your p(doom), but in a post from three years ago, you stated,

If you believe the key claims of "there is a >=1% chance of AI causing x-risk and >=0.1% chance of bio causing x-risk in my lifetime" this is enough to justify the core action relevant points of EA.

Let's assume that there's a 2% chance of AI causing existential risk, and that, optimistically, pausing for a decade would cut this risk in half (rather than barely decreasing it, or even increasing it). This would imply that the total risk would diminish from 2% to 1%.

According to OWID, approximately 63 million people die every year, although this rate is expected to increase, rising to around 74 million in 2035. If we assume that around 68 million people will die per year during the relevant time period, and that they could have been saved by AI-enabled medical progress, then pausing AI for a decade would kill around 680 million people.

This figure is around 8.3% of the current global population, and would constitute a death count higher than the combined death toll from World War 1, World War 2, the Mongol Conquests, the Taiping rebellion, the Transition from Ming to Qing, and the Three Kingdoms Civil war.

(Note that, although we are counting deaths from old age in this case, these deaths are comparable to deaths in war from a years of life lost perspective, if you assume that AI-accelerated medical breakthroughs will likely greatly increase human lifespan.)

From the perspective of an individual human life, a 1% chance of death from AI is significantly lower than a 8.3% chance of death from aging—though obviously in the former case this risk would apply independently of age, and in the latter case, the risk would be concentrated heavily among people who are currently elderly.

Even a briefer pause lasting just two years, while still cutting risk in half, would not survive this basic cost-benefit test. Of course, it's true that it's difficult to directly compare the individual personal costs from AI existential risk to the diseases of old age. For example, AI existential risk has the potential to be briefer and less agonizing, which, all else being equal, should push us to favor it. On the other hand, most people might consider death from old age to be preferable since it's more natural and allows the human species to continue.

Nonetheless, despite these nuances, I think the basic picture that I'm presenting holds up here: under typical assumptions (such as the ones you gave three years ago), a purely individualistic framing of the costs and benefits of AI pause do not clearly favor pausing, from the perspective of people who currently exist. This fact was noted in Nick Bostrom's original essay on Astronomical Waste, and more recently, by Chad Jones in his paper on the tradeoffs involved in stopping AI development.

Neel NandaFeb 172

Ah, gotcha. Yes, I agree that if your expected reduction in p(doom) is less than around 1% per year of pause, and you assign zero value to future lives, then pausing is bad on utilitarian grounds

Note that my post was not about my actual numerical beliefs, but about a lower bound that I considered highly defensible - I personally expect notably higher than 1%/year reduction and was taking that as given, but on reflection I at least agree that that's a more controversial belief (I also think that a true pause is nigh impossible)

I expect there are better solutions that achieve many of the benefits of pausing while still enabling substantially better biotech research, but that's nitpicking

I'm not super sure what you mean by individualistic. I was modelling this as utilitarian but assigning literally zero value to future people. From a purely selfish perspective, I'm in my mid-20s and my chances of dying from natural causes in the next say 20 years are pretty damn low, and this means that given my background beliefs about doom and timelines, slowing down AI is great deal from my perspective. While if I expected to die from old age in the next 5 years I would be a lot more opposed

Matthew_BarnettFeb 174

I'm not super sure what you mean by individualistic. I was modelling this as utilitarian but assigning literally zero value to future people. From a purely selfish perspective, I'm in my mid-20s and my chances of dying from natural causes in the next say 20 years are pretty damn low, and this means that given my background beliefs about doom and timelines, slowing down AI is great deal from my perspective. While if I expected to die from old age in the next 5 years I would be a lot more opposed

A typical 25 year old man in the United States has around a 4.3% chance of dying before they turn 45 according to these actuarial statistics from 2019 (the most recent non-pandemic year in the data). I wouldn't exactly call that "pretty damn low", though opinions on these things differ. This is comparable to my personal credence that AIs will kill me in the next 20 years. And if AI goes well, it will probably make life really awesome. So from this narrowly selfish point of view I'm still not really convinced pausing is worth it.

Perhaps more importantly: do you not have any old family members that you care about?

Neel NandaFeb 174

4% is higher than I thought! Presumably much of that is people who had pre-existing conditions which I don't or people who got into eg a car accidents which AI probably somewhat reduces, but this seems a lot more complicated and indirect to me.

But this isn't really engaging with my cruxes. it seems pretty unlikely to me that we will pause until we have pretty capable and impressive AIs and to me much of the non-doom scenarios comes from uncertainty about when we will get powerful ai and how capable it will be. And I expect this to be much clearer the closer we get to these systems, or at the very least the empirical uncertainty about whether it'll happen will be a lot clearer. I would be very surprised if there was the political will to do anything about this before we got a fair bit closer to the really scary systems.

And yep, I totally put more than 4% chance that I get killed by AI in the next 20 years. But I can see this is a more controversial belief and one that requires higher standards of evidence to argue for. If I imagine a hypothetical world where I know that in 2 years we could have aligned super intelligent AI with 98% probability and it would kill us all with 2% probability. Or we could pause for 20 years and that would get it from 98 to 99%, then I guess from a selfish perspective I can kind of see your point. But I know I do value humanity not going extinct a fair amount even if I think that total utilitarianism is silly. But I observe that I'm finding this debate kind of slippery and I'm afraid that I'm maybe moving the goalposts here because I disagree on many counts so it's not clear what exactly my cruxes are, or where I'm just attacking points in what you say that seem off

I do think that the title of your post is broadly reasonable though. I'm an advocate for making AI x-risk cases that are premised on common sense morality like "human extinction would be really really bad", and utilitarianism in the true philosophical sense is weird and messy and has pathological edge cases and isn't something that I fully trust in extreme situations

Matthew_BarnettFeb 172

I think what you're saying about your own personal tradeoffs makes a lot of sense. Since I think we're in agreement on a bunch of points here, I'll just zero in on your last remark, since I think we still might have an important lingering disagreement:

I do think that the title of your post is broadly reasonable though. I'm an advocate for making AI x-risk cases that are premised on common sense morality like "human extinction would be really really bad", and utilitarianism in the true philosophical sense is weird and messy and has pathological edge cases and isn't something that I fully trust in extreme situations

I'm not confident, but I suspect that your perception of what common sense morality says is probably a bit inaccurate. For example, suppose you gave people the choice between the following scenarios:

In scenario A, their lifespan, along with the lifespans of everyone currently living, would be extended by 100 years. Everyone in the world would live for 100 years in utopia. At the end of this, however, everyone would peacefully and painlessly die, and then the world would be colonized by a race of sentient aliens.

In scenario B, everyone would receive just 2 more years to live. During this 2 year interval, life would be hellish and brutal. However, at the end of this, everyone would painfully die and be replaced by a completely distinct set of biological humans, ensuring that the human species is preserved.

In scenario A, humanity goes extinct, but we have a good time for 100 years. In scenario B, humanity is preserved, but we all die painfully in misery.

I suspect most people would probably say that scenario A is far preferable to scenario B, despite the fact that in scenario A, humanity goes extinct.

To be clear, I don't think this scenario is directly applicable to the situation with AI. However, I think this thought experiment suggests that, while people might have some preference for avoiding human extinction, it's probably not anywhere near the primary thing that people care about.

Based on people's revealed preferences (such as how they spend their time, and who they spend their money on), most people care a lot about themselves and their family, but not much about the human species as an abstract concept that needs to be preserved. In a way, it's probably the effective altruist crowd that is unusual in this respect by caring so much about human extinction, since most people don't give the topic much thought at all.

Neel NandaFeb 172

This got me curious so I had deep research make me a report on my probability of dying from different causes. It estimates that in the next 20 years I've maybe a 1 and 1/2 to 3% Chance of death, of which 0.5-1% is chronic illness where it'll probably help a lot. Infectious diseases is less than .1%, Doesn't really matter. Accidents are .5 to 1%, AI probably helps but kind of unclear. .5 to 1% on other, mostly suicide. Plausibly AI also leads to substantially improved mental health treatments which helps there? So yeah, I buy that having AGI today Vs in twenty years has small but non trivial costs to my chances of being alive when it happens

DavidmanheimFeb 118

First, Utilitarianism doesn't traditionally require the type of extreme species neutrality you propose here.Singer and many EAs gave a somewhat narrower view of what 'really counts' as Utilitarian, but your argument assumes that narrow view without really justifying it.

Second, you assume future AIs will have rich inner lives that are valuable, instead of paperclipping the universe. You say "one would need to provide concrete evidence about what kinds of objectives advanced AIs are actually expected to develop" - but Eliezer has done that quite explicitly.

Matthew_BarnettFeb 11-5

Brad West🔸Feb 117

The preference for humans remaining alive/in control isn't necessarily speciesist because it's the qualities of having valuable conscious experience and concern for the promotion of valuable as well as avoidance of disvaluable conscious experience that might make one prefer this outcome.

We do not know whether ASI would have these qualities or preferences, but if we could know that it did, you would have a much stronger case for your argument.

Matthew_BarnettFeb 112

Would you say the assumption that advanced AI will not be conscious is a load-bearing premise—meaning that if advanced AIs were shown to be conscious, the case for delaying AI development would collapse?

If this is the case, then I think this premise should be explicitly flagged in discussions and posts about delaying AI. Personally, I don’t find it unlikely that future AIs will be conscious. In fact, many mainstream theories of consciousness suggest that this outcome is likely, such as computationalism and functionalism. This makes the idea of delaying AI appear to rest on a shaky foundation.

Moreover, I have come across very few arguments in EA literature that rigorously try to demonstrate to AIs would not be conscious, and then connect this point to AI risk. As I wrote in the post:

To be clear, I am not denying that one could construct a purely utilitarian argument for why AIs might generate less moral value than humans, and thus why delaying AI could be justified. My main point, however, is that evidence supporting such an argument is rarely made explicit or provided in discussions on this topic.
For instance, one common claim is that the key difference between humans and AIs is consciousness—that is, humans are known to be conscious, while AIs may not be. Because moral value is often linked to consciousness, this argument suggests that ensuring the survival of humans (rather than being replaced by AIs) is crucial for preserving moral value.
While I acknowledge that this is a major argument people often invoke in personal discussions, it does not appear to be strongly supported within effective altruist literature. In fact, I have come across very few articles on the EA Forum or in EA literature that explicitly argue that AIs will not be conscious and then connect this point to the urgency of delaying AI, or reducing AI existential risk. Indeed, I suspect there are many more articles from EAs that argue what is functionally the opposite claim—namely, that AIs will probably be conscious. This is likely due to the popularity of functionalist theories of consciousness among many effective altruists, which suggest that consciousness is determined by computational properties rather than biological substrate. If one accepts this view, then there are few inherent reasons to assume that future AIs would lack consciousness or moral worth.

Brad West🔸Feb 113

Yeah I would think that we would want ASI-entities to (a) have positively valenced experienced as well as the goal of advancing their positively valenced experience (and minimizing their own negatively valenced experience) and/or (b) have the goal of advancing positive valenced experiences of other beings and minimizing negatively valenced experiences.

A lot of the discussion I hear around the importance of "getting alignment right" pertains to lock-in effects regarding suboptimal futures.

Given the probable irreversibility of the fate accompanying ASI and the potential magnitude of good and bad consequences across space and time, trying to maximize the chances of positive outcomes seems simply prudent. Perhaps some of the "messaging" of AI safety seems to be a bit human-centered, because this might be more accessible to more people. But most who have seriously considered a post-ASI world have considered the possibility of digital minds both as moral patients (capable of valenced experience) and as stewards of value and disvalue in the universe.

Matthew_BarnettFeb 11*4

I agree EAs often discuss the importance of "getting alignment right" and then subtly frame this in terms of ensuring that AIs either care about consciousness or possess consciousness themselves. However, the most common explicit justification for delaying AI development is the argument that doing so increases the likelihood that AIs will be aligned with human interests. This distinction is crucial because aligning AI with human interests is not the same as ensuring that AI maximizes utilitarian value—human interests and utilitarian value are not equivalent.

Currently, we lack strong empirical evidence to determine whether AIs will ultimately generate more or less value than humans from a utilitarian point of view. Because we do not yet know which is the case, there is no clear justification for defaulting to delaying AI development rather than accelerating it. If AIs turn out to generate more moral value than humans, then delaying AI would mean we are actively making a mistake—we would be increasing the probability of future human dominance, since by assumption, the main effect from delaying AI is to increase the probability that AIs will be aligned with human interests. This would risk entrenching a suboptimal future.

On the other hand, if AIs end up generating less value, as many effective altruists currently believe, then delaying AI would indeed be the right decision. However, since we do not yet have enough evidence to determine which scenario is correct, we should recognize this uncertainty rather than assume that delaying AI is the obviously preferable, or default course of action.

Brad West🔸Feb 119

Because we face substantial uncertainty around the eventual moral value of AIs, any small reduction in p(doom) or catastrophic outcomes—including S-risks—carries enormous expected utility. Even if delaying AI costs us a few extra years before reaping its benefits (whether enjoyed by humans, other organic species, or digital minds), that near-term loss pales in comparison to the potentially astronomical impact of preventing (or mitigating) disastrous futures or enabling far higher-value ones.

From a purely utilitarian viewpoint, the harm of a short delay is utterly dominated by the scale of possible misalignment risks and missed opportunities for ensuring the best long-term trajectory—whether for humans, other organic species, or digital minds. Consequently, it’s prudent to err on the side of delay if doing so meaningfully improves our chance of securing a safe and maximally valuable future. This would be true regardless of the substrate of consciousness.

Matthew_BarnettFeb 11*2

From a purely utilitarian viewpoint, the harm of a short delay is utterly dominated by the scale of possible misalignment risks and missed opportunities for ensuring the best long-term trajectory—whether for humans, other organic species, or digital minds. Consequently, it’s prudent to err on the side of delay if doing so meaningfully improves our chance of securing a safe and maximally valuable future.

Your argument appears to assume that, in the absence of evidence about what goals future AI systems will have, delaying AI development should be the default position to mitigate risk. But why should we accept this assumption? Why not consider acceleration just as reasonable a default? If we lack meaningful evidence about the values AI will develop, then we have no more justification for assuming that delay is preferable than we do for assuming that acceleration is.

In fact, one could just as easily argue the opposite: that AI might develop moral values superior to those of humans. This claim appears to have about as much empirical support as the assumption that AI values will be worse. This argument could then justify accelerating AI rather than delaying it. Using the same logic that you just applied, one could make a symmetrical counterargument against your position: that accelerating AI is actually the correct course of action, since any minor harms caused by moving forward are vastly outweighed by the long-term risk of locking in suboptimal values through unnecessary delay. Delaying AI development would, in this context, risk entrenching human values, which are suboptimal to the default AI values that we would get through accelerating.

You might think that even weak evidence in favor of delaying AI is sufficient to support this strategy as the default course of action. But this would seem to assume a "knife’s edge" scenario, where even a slight epistemic advantage—such as a 51% chance that delay is beneficial versus a 49% chance that acceleration is beneficial—should be enough to justify committing to a pause. If we adopted this kind of reasoning in other domains, we would quickly fall into epistemic paralysis, constantly shifting strategies based on fragile, easily reversible analysis.

Given this high level of uncertainty about AI’s future trajectory, I think the best approach is to focus on the most immediate and concrete tradeoffs that we can analyze with some degree of confidence. This includes whether delaying or accelerating AI is likely to be more beneficial to the current generation of humans. However, based on the available evidence, I believe that accelerating AI—rather than delaying it—is likely the better choice, as I highlight in my post.

Kaspar BrandnerFeb 12*4

For preference utilitarianism, there aren't any fundamentally immoral "speciesist preferences". Preferences just are what they are, and existing humans clearly have a strong and overwhelming-majority preference for humanity to continue to exist in the future. Do we have to weigh these preferences against the preferences of potential future AIs to exist, on pain of speciesism? No, because those AIs do not now exist, and non-existing entities do not have any preferences, nor will they have any if we don't create them. So not creating them isn't bad for them. Something could only be bad for them if they existed. This is called the procreation asymmetry. There are strong arguments for the procreation asymmetry being correct, see e.g. here.

The case is similar to a couple which is about to decide whether to have a baby or get a robot. The couple strongly prefers having the baby. Now, both not creating the baby and not creating the robot isn't bad for the robot nor the baby, since neither would suffer their non-existence. However, there is still a reason to create the baby specifically: The parents want to have one. Not having a baby wouldn't be bad for the non-existent baby, but it would be bad for the parents. So the extinction of humanity is bad because we don't want humanity to go extinct.

Matthew_BarnettFeb 122

Preferences just are what they are, and existing humans clearly have a strong and overwhelming-majority preference for humanity to continue to exist in the future. [...] So the extinction of humanity is bad because we don't want humanity to go extinct.

This argument appears very similar to the one I addressed in the essay about how delaying or accelerating AI will impact the well-being of currently existing humans. My claim is not that it isn't bad if humanity goes extinct; I am certainly not saying that it would be good if everyone died. Rather, my claim is that, if your reason for caring about human extinction arises from a concern for the preferences of the existing generation of humans, then you should likely push for accelerating AI so long as the probability of human extinction from AI is fairly low.

I'll quote the full argument below:

Of course, one can still think—as I do—that human extinction would be a terrible outcome for the people who are alive when it occurs. Even if the AIs that replace us are just as morally valuable as we are from an impartial moral perspective, it would still be a moral disaster for all currently existing humans to die. However, if we accept this perspective, then we must also acknowledge that, from the standpoint of people living today, there appear to be compelling reasons to accelerate AI development rather than delay it for safety reasons.
The reasoning is straightforward: if AI becomes advanced enough to pose an existential threat to humanity, then it would almost certainly also be powerful enough to enable massive technological progress—potentially revolutionizing medicine, biotechnology, and other fields in ways that could drastically improve and extend human lives. For example, advanced AI could help develop cures for aging, eliminate extreme suffering, and significantly enhance human health through medical and biological interventions. These advancements could allow many people who are alive today to live much longer, healthier, and more fulfilling lives.
As economist Chad Jones has pointed out, delaying AI development means that the current generation of humans risks missing out on these transformative benefits. If AI is delayed for years or decades, a large fraction of people alive today—including those advocating for AI safety—would not live long enough to experience these life-extending technologies. This leads to a strong argument for accelerating AI, at least from the perspective of present-day individuals, unless one is either unusually risk-averse, or they have a very high confidence (such as above 50%) that AI will lead to human extinction.
To be clear, if someone genuinely believes there is a high probability that AI will wipe out humanity, then I agree that delaying AI would seem rational, since the high risk of personal death would outweigh the small possibility of a dramatically improved life. But for those who see AI extinction risk as relatively low (such as below 15%), accelerating AI development appears to be the more pragmatic personal choice.
Thus, while human extinction would undoubtedly be a disastrous event, the idea that even a small risk of extinction from AI justifies delaying its development—even if that delay results in large numbers of currently existing humans dying from preventable causes—is not supported by straightforward utilitarian reasoning. The key question here is what extinction actually entails. If human extinction means the total disappearance of all complex life and the permanent loss of all future value, then mitigating even a small risk of such an event might seem overwhelmingly important. However, if the outcome of human extinction is simply that AIs replace humans—while still continuing civilization and potentially generating vast amounts of moral value—then the reasoning behind delaying AI development changes fundamentally.
In this case, the clearest and most direct tradeoff is not about preventing "astronomical waste" in the classic sense (i.e., preserving the potential for future civilizations) but rather about whether the risk of AI takeover is acceptable to the current generation of humans. In other words, is it justifiable to impose costs on presently living people—including delaying potentially life-saving medical advancements—just to reduce a relatively small probability that humanity might be forcibly replaced by AI? This question is distinct from the broader existential risk arguments that typically focus on preserving all future potential value, and it suggests that delaying AI is not obviously justified by utilitarian logic alone.

Kaspar BrandnerFeb 121

This argument appears very similar to the one I addressed in the essay about how delaying or accelerating AI will impact the well-being of currently existing humans. My claim is not that it isn't bad if humanity goes extinct; I am certainly not saying that it would be good if everyone died.

I'm not supposing you do. Of course most people have a strong preference not to die. But there is also (beyond that) a widespread preference for humanity not to go extinct. This is why it e.g. would be so depressing (as in the movie Children of Men) when a global virus made all humans infertile. Ending humanity is very different from and much worse than people merely dying at the end of their lives, which by itself doesn't imply extinction. Many people would likely even sacrifice their own life in order to safe the future of humanity. We don't have a similar preference for having AI descendants. That's not speciesist, it's just what our preferences are.

Matthew_BarnettFeb 122

We can assess the strength of people's preferences for future generations by analyzing their economic behavior. The key idea is that if people genuinely cared deeply about future generations, they would prioritize saving a huge portion of their income for the benefit of those future individuals rather than spending it on themselves in the present. This would indicate a strong intertemporal preference for improving the lives of future people over the well-being of currently existing individuals.

For instance, if people truly valued humanity as a whole far more than their own personal well-being, we would expect parents to allocate the vast majority of their income to their descendants (or humanity collectively) rather than using it for their own immediate needs and desires. However, empirical studies generally do not support the claim that people place far greater importance on the long-term preservation of humanity than on the well-being of currently existing individuals. In reality, most people tend to prioritize themselves and their children, while allocating only a relatively small portion of their income to charitable causes or savings intended to benefit future generations beyond their immediate children. If people were intrinsically and strongly committed to the abstract concept of humanity itself, rather than primarily concerned with the welfare of present individuals (including their immediate family and friends), we would expect to see much higher levels of long-term financial sacrifice for future generations than we actually observe.

To be clear, I'm not claiming that people don’t value their descendants, or the concept of humanity at all. Rather, my point is that this preference does not appear to be strong enough to override the considerations outlined in my previous argument. While I agree that people do have an independent preference for preserving humanity—beyond just their personal desire to avoid death—this preference is typically not way stronger than their own desire for self-preservation. As a result, my previous conclusion still holds: from the perspective of present-day individuals, accelerating AI development can still be easily justified if one does not believe in a high probability of human extinction from AI.

Kaspar BrandnerFeb 131

The economic behavior analysis falls short. People usually do not expect to have a significant impact on the survival of humanity. If in the past centuries people had saved a large part of their income for "future generations" (including for us) this would likely have had almost no impact on the survival of humanity, probably not even significantly on our present quality of life. The expected utility of saving money for future generations is simply too low compared to spending the money in the present for themselves. This does just mean that people (reasonably) expect to have little influence on the survival of humanity, not that they are relatively okay with humanity going extinct. If people could somehow directly influence, via voting perhaps, whether to trade a few extra years of life against a significant increase in the likelihood of humanity going extinct, I think the outcome would be predictable.

Though I'm indeed not specifically commenting here on what delaying AI could realistically achieve. My main point was only that the preferences for humanity not going extinct are significant and that they easily outweigh any preferences for future AI coming into existence, without relying on immoral speciesism.

David Mathers🔸Feb 122

I don't think you can get from the procreation asymmetry to only current and not future preferences matter. Even if you think that people being brought into existence and having their preferences fulfilled has no greater value than them not coming into existence, you might still want to block the existence of unfulfilled future preferences. Indeed, it seems any sane view has to accept that harms to future people if they do exist are bad, otherwise it would be okay to bring about unlimited future suffering, so long as the people who will suffer don't exist yet.

Kaspar BrandnerFeb 121

Not coming into existence would not be a future harm to the person that doesn't come into existence, because in that case it not only doesn't exist, it also won't exist. That's different from a person that would suffer from something, because in that case it would exist.

David Mathers🔸Feb 122

My point is that even if you believe in the assymetry you should still care whether humans or AIs being in charge leads to higher utility for those who do exist, even if you are indifferent between either of those outcomes and neither humans nor AIs existing in the future.

Kaspar BrandnerFeb 121

Yes, though I don't think that contradicts anything I said originally.

David Mathers🔸Feb 122

It shows that just being person-affecting doesn't mean that you can argue that since current human preferences are the only ones that exist now, and they are against extinction, person-affecting utilitarians don't have to compare what a human-ruled future would be like to what an AI would be like, when deciding whether AIs replacing humans would be net bad from a utilitarian perspective. But maybe I was wrong to read you as denying that.

Kaspar BrandnerFeb 121

No here you seem to contradict the procreation asymmetry. When deciding whether we should create certain agents, we wouldn't harm them if we decide against creating them. Even if the AIs would be happier than the humans.

David Mathers🔸Feb 122

By creating certain agents in a scenario where it is (basically) guaranteed that there are some agents or other, we determine the amount of unfulfilled preferences in the future. Sensible person-affecting views still prefer agent-creating decision that lead to less frustrated existing future preferences over more existing future preferences.

EDIT: Look at it this way: we are not choosing between futures with zero subjects of welfare and futures with non-zero, where person-affecting views are indeed indifferent, so long as the future with subjects has net-positive utility. Rather we are choosing between two agent-filled futures: one with human agents and another with AIs. Sensible person-affecting views prefer the future with less unfulfilled preferences over the one with more, when both futures contain agents. So to make a person-affecting case against AIs replacing humans, you need to take into account whether AIs replacing humans leads to more/less frustrated preferences existing in the future, not just whether it frustrates the preferences of currently existing agents.

Kaspar BrandnerFeb 121

I disagree. If we have any choice at all over which future populations to create, we also have the option to not creating any descendants at all. Which would be advisable e.g. if we had reason to think both humans and AIs would have net bad lives in expectation.

Dan_KeysFeb 134

Nate Soares' take here was that an AI takeover would most likely lead to an "unconscious meh" scenario, where "The outcome is worse than the “Pretty Good” scenario, but isn’t worse than an empty universe-shard" and "there’s little or no conscious experience in our universe-shard’s future. E.g., our universe-shard is tiled with tiny molecular squiggles (a.k.a. “molecular paperclips”)." Whereas humanity boosted by ASI would probably lead to a better outcome.

That was also the most common view in the polls in the comments there.

SummaryBotFeb 121

Executive summary: The standard argument for delaying AI development, often framed as a utilitarian effort to reduce existential risk, implicitly prioritizes the survival of the human species itself rather than maximizing well-being across all sentient beings, making it inconsistent with strict utilitarian principles.

Key points:

While delaying AI is often justified by the utilitarian astronomical waste argument, this reasoning assumes that AI-driven human extinction equates to total loss of future value, which is not necessarily true.
If advanced AIs continue civilization and generate moral value, then human extinction is distinct from total existential catastrophe, making species survival a non-utilitarian concern.
The argument for delaying AI often rests on an implicit speciesist preference for human survival, rather than on clear evidence that AI would produce less moral value than human-led civilization.
A consistent utilitarian view would give moral weight to all sentient beings, including AIs, and would not inherently favor human control over the future.
If AI development is delayed, present-day humans may miss out on significant benefits, such as medical breakthroughs and life extension, which creates a direct tradeoff.
While a utilitarian case for delaying AI could exist (e.g., if AIs were unlikely to be conscious or morally aligned), such arguments are rarely explicitly made or substantiated in EA discussions.

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.