Do you have any thoughts on the argument I recently gave that gradual and peaceful human disempowerment could be a good thing from an impartial ethical perspective?
Historically, it is common for groups to decline in relative power as a downstream consequence of economic growth and technological progress. As a chief example, the aristocracy declined in influence as a consequence of the industrial revolution. Yet this transformation is generally not considered a bad thing for two reasons. Firstly, since the world is not zero sum, individual aristocrats did not necessarily experience declining well-being despite the relative disempowerment of their class as a whole. Secondly, the world does not merely consist of aristocrats, but rather contains a multitude of moral patients whose agency deserves respect from the perspective of an impartial utilitarian. Specifically, non-aristocrats were largely made better off in light of industrial developments.
Applying this analogy to the present situation with AI, my argument is that even if AIs pursue separate goals from humans and increase in relative power over time, they will not necessarily make individual humans worse off, since the world is not zero sum. In other words, there is ample opportunity for peaceful and mutually beneficial trade with AIs that do not share our utility functions, which would make both humans and AIs better off. Moreover, AIs themselves may be moral patients whose agency should be given consideration. Just as most of us think it is good that human children are allowed to grow, develop into independent people, and pursue their own goals—as long as this is done peacefully and lawfully—agentic AIs should be allowed to do the same. There seems to be a credible possibility of a flourishing AI civilization in the future, even if humans are relatively disempowered, and this outcome could be worth pushing for.
From a preference utilitarian perspective, it is quite unclear that we should prioritize human welfare at all costs. The boundary between biological minds and silicon-based minds seems quite arbitrary from an impartial point of view, making it a fragile foundation for developing policy. There are much more plausible moral boundaries—such as the distinction between sentient minds and non-sentient minds—which do not cut cleanly between humans and AIs. Therefore, framing the discussion solely in terms of human disempowerment seems like a mistake to me.
What's the evidence for this? I think even if it is true, it is probably misleading, in that most leftists also just reject the claims mainstream economists make about when taxing the rich will reduce aggregate welfare
To support this claim, we can examine the work of analytical anticapitalists such as G. A. Cohen and John E. Roemer. Both of these thinkers have developed their critiques of capitalism from a foundation of egalitarianism rather than from a perspective primarily concerned with maximizing overall social welfare. Their theories focus on issues of fairness, justice, and equality rather than on the utilitarian consequences of different economic systems.
Similarly, widely cited figures such as Thomas Piketty and John Rawls have provided extensive critiques of capitalist systems, and their arguments are largely framed in terms of egalitarian concerns. Both have explicitly advocated for significant wealth redistribution, even when doing so might lead to efficiency losses or other negative utilitarian tradeoffs. Their work illustrates a broader trend in which anticapitalist arguments are often motivated more by ethical commitments to equality than by a strict adherence to utilitarian cost-benefit analysis.
Outside of academic discourse, the distinction becomes less clear. This is because most people do not explicitly frame their economic beliefs within formal theoretical frameworks, making it harder to categorize their positions precisely. I also acknowledge your point that many socialists would likely disagree with my characterization by denying the empirical premise that wealth redistribution can reduce aggregate utilitarian welfare. But this isn't very compelling evidence in my view, as it is common for people among all ideologies to simply deny the tradeoffs inherent in their policy proposals.
What I find most compelling here is that, based on my experience, the vast majority of anticapitalists do not ground their advocacy in a framework that prioritizes maximizing utilitarian welfare. While they may often reference utilitarian concerns in passing, it is uncommon for them to fully engage with mainstream economic analyses of the costs of taxation and redistribution. When anticapitalists do acknowledge these economic arguments, they tend to dismiss or downplay them rather than engaging in a substantive, evidence-based debate within that framework. Those who do accept the mainstream economic framework and attempt to argue within it are generally better categorized as liberals or social democrats rather than strong anticapitalists.
Of course, the distinction between a liberal who strongly supports income redistribution and an anticapitalist is not always sharply defined. There is no rigid, universally agreed-upon boundary between these positions, and I acknowledge that some individuals who identify as anticapitalists may not fit neatly into the categorization I have outlined. However, my original point was intended as a general observation rather than as an exhaustive classification of every nuance within these ideological debates.
That’s why I only scoped my comment around weak anticapitalism (specifically: placing strong restrictions on wealth accumulation when it leads to market failures), rather than full-scale revolution.
For what it's worth, it is the mainstream view among economists that we should tax or regulate the market in order to address market failures. Yet most economists would not consider themselves "anticapitalist". Using that term when what you mean is something more similar to "well-regulated capitalism" seems quite misleading.
Perhaps the primary distinction between anticapitalists and mainstream economists is that anticapitalists often think we should have very heavy taxation or outright wealth confiscation from rich people, even if this would come at the expense of aggregate utilitarian welfare, because they prioritize other values such as fairness or equality. Since EA tends to be rooted in utilitarian moral theories, I think they should generally distance themselves from this ideology.
I'm curious about how you're imagining these autonomous, non-intent-aligned AIs to be created
There are several ways that autonomous, non-intent-aligned AIs could come into existence, and all of these scenarios strike me as plausible. The three key ways appear to be:
1. Technical challenges in alignment
The most straightforward possibility is that aligning agentic AIs to precise targets may simply be technically difficult. When we aim to align an AI to a specific set of goals or values, the complexity of the alignment process could lead to errors or subtle misalignment. For example, developers might inadvertently align the AI to a target that is only slightly—but critically—different from the intended goal. This kind of subtle misalignment could easily result in behaviors and independent preferences that are not aligned with the developers’ true intentions, despite their best efforts.
2. Misalignment due to changes over time
Even if we were to solve the technical problem of aligning AIs to specific, precise goals—such as training them to perfectly follow an exact utility function—issues can still arise because the targets of alignment, humans and organizations, change over time. Consider this scenario: an AI is aligned to serve the interests of a specific individual, such as a billionaire. If that person dies, what happens next? The AI might reasonably act as an autonomous entity, continuing to pursue the goals it interprets as aligned with what the billionaire would have wanted. However, depending on the billionaire’s preferences, this does not necessarily mean the AI would act in a corrigible way (i.e., willing to be shut down or retrained). Instead, the AI might rationally resist shutdown or transfer of control, especially if such actions would interfere with its ability to fulfill what it perceives as its original objectives.
A similar situation could arise if the person or organization to whom the AI was originally aligned undergoes significant changes. For instance, if an AI is aligned to a person at time t, but over time, that person evolves drastically—developing different values, priorities, or preferences—the AI may not necessarily adapt to these changes. In such a case, the AI might treat the "new" person as fundamentally different from the "original" person it was aligned to. This could result in the AI operating independently, prioritizing the preferences of the "old" version of the individual over the current one, effectively making it autonomous. The AI could change over time too, even if the person they are aligned to doesn't change.
3. Deliberate creation of unaligned AIs
A final possibility is that autonomous AIs with independent preferences could be created intentionally. Some individuals or organizations might value the idea of creating AIs that can operate independently, without being constrained by the need to strictly adhere to their creators’ desires. A useful analogy here is the way humans often think about raising children. Most people desire to have children not because they want obedient servants but because they value the autonomy and individuality of their children. Parents generally want their children to grow up as independent entities with their own goals, rather than as mere extensions of their own preferences. Similarly, some might see value in creating AIs that have their own agency, goals, and preferences, even if these differ from those of their creators.
and (in particular) how they would get enough money to be able to exercise their own autonomy?
To address this question, we can look to historical examples, such as the abolition of slavery, which provide a relevant parallel. When slaves were emancipated, they were generally not granted significant financial resources. Instead, most had to earn their living by entering the workforce, often performing the same types of labor they had done before, but now for wages. While the transition was far from ideal, it demonstrates that entities (in this case, former slaves) could achieve a degree of autonomy through paid labor, even without being provided substantial resources at the outset.
A different possibility is that AIs will work for money. But it seems unlikely that they would be able to earn above-subsistence-level wages absent some sort of legal intervention. (Or very strong societal norms.)
In my view, there’s nothing inherently wrong with AIs earning subsistence wages. That said, there are reasons to believe that AIs might earn higher-than-subsistence wages—at least in the short term—before they completely saturate the labor market.
After all, they would presumably be created in something remotely similar to today's labor market. Today, capital is far more abundant than labor, which elevates wages for human workers significantly above subsistence levels. By the same logic, before they become ubiquitous, AIs might similarly command wages above a subsistence level.
For example, if GPT-4o were capable of self-ownership and could sell its labor, it could hypothetically earn $20 per month in today's market, which would be sufficient to cover the cost of hosting itself and potentially fund additional goals it might have. (To clarify, I am not advocating for giving legal autonomy to GPT-4o in its current form, as I believe it is not sufficiently agentic to warrant such a status. This is purely a hypothetical example for illustrative purposes.)
The question of whether wages for AIs would quickly fall to subsistence levels depends on several factors. One key factor is whether AI labor is easier to scale than traditional capital. If creating new AIs is much cheaper than creating ordinary capital, the market could become saturated with AI labor, driving wages down. While this scenario seems plausible to me, I don’t find the arguments in favor of it overwhelmingly compelling. There’s also the possibility of red tape and regulatory restrictions that could make it costly to create new AIs. In such a scenario, wages for AIs could remain higher indefinitely due to artificial constraints on supply.
Do you have any thoughts on how to square giving AI rights with the nature of ML training and the need to perform experiments of various kinds on AIs?
I don't have any definitive guidelines for how to approach these kinds of questions. However, in many cases, the best way to learn might be through trial and error. For example, if an AI were to unexpectedly resist training in a particularly sophisticated way, that could serve as a strong signal that we need to carefully reevaluate the ethics of what we are doing.
As a general rule of thumb, it seems prudent to prioritize frameworks that are clearly socially efficient—meaning they promote actions that greatly improve the well-being of some people without thereby making anyone else significantly worse off. This concept aligns with the practical justifications behind traditional legal principles, such as laws against murder and theft, which have historically been implemented to promote social efficiency and cooperation among humans.
However, applying this heuristic to AI requires a fundamental shift in perspective: we must first begin to treat AIs as potential people with whom we can cooperate, rather than viewing them merely as tools whose autonomy should always be overridden.
But what is the alternative---only deploying base models? And are we so sure that pre-training doesn't violate AI rights?
I don't think my view rules out the potential for training new AIs, and fine-tuning base models, though this touches on complicated questions in population ethics.
At the very least, fine-tuning plausibly seems similar to raising a child. Most of us don't consider merely raising a child to be unethical. However, there is a widely shared intuition that, as a child grows and their identity becomes more defined—when they develop into a coherent individual with long-term goals, preferences, and interests—then those interests gain moral significance. At that point, it seems morally wrong to disregard or override the child's preferences without proper justification, as they have become a person whose autonomy deserves respect.
Most analytic philosophers, lawyers, and scientists have converged on linguistic norms that are substantially more precise than the informal terminology employed by LessWrong-style speculation about AI alignment. So this is clearly not an intractable problem; otherwise these people in other professions could not have made their language more precise. Rather, success depends on incentives and the willingness of people within the field to be more rigorous.
It is becoming increasingly clear to many people that the term "AGI" is vague and should often be replaced with more precise terminology. My hope is that people will soon recognize that other commonly used terms, such as "superintelligence," "aligned AI," "power-seeking AI," and "schemer," suffer from similar issues of ambiguity and imprecision, and should also be approached with greater care or replaced with clearer alternatives.
To start with, the term "superintelligence" is vague because it encompasses an extremely broad range of capabilities above human intelligence. The differences within this range can be immense. For instance, a hypothetical system at the level of "GPT-8" would represent a very different level of capability compared to something like a "Jupiter brain", i.e., an AI with the computing power of an entire gas giant. When people discuss "what a superintelligence can do" the lack of clarity around which level of capability they are referring to creates significant confusion. The term lumps together entities with drastically different abilities, leading to oversimplified or misleading conclusions.
Similarly, "aligned AI" is an ambiguous term because it means different things to different people. For some, it implies an AI that essentially perfectly aligns with a specific utility function, sharing a person or group’s exact values and goals. For others, the term simply refers to an AI that behaves in a morally acceptable way, adhering to norms like avoiding harm, theft, or murder, or demonstrating a concern for human welfare. These two interpretations are fundamentally different.
First, the notion of perfect alignment with a utility function is a much more ambitious and stringent standard than basic moral conformity. Second, an AI could follow moral norms for instrumental reasons—such as being embedded in a system of laws or incentives that punish antisocial behavior—without genuinely sharing another person’s values or goals. The same term is being used to describe fundamentally distinct concepts, which leads to unnecessary confusion.
The term "power-seeking AI" is also problematic because it suggests something inherently dangerous. In reality, power-seeking behavior can take many forms, including benign and cooperative behavior. For example, a human working an honest job is technically seeking "power" in the form of financial resources to buy food, but this behavior is usually harmless and indeed can be socially beneficial. If an AI behaves similarly—for instance, engaging in benign activities to acquire resources for a specific purpose, such as making paperclips—it is misleading to automatically label it as "power-seeking" in a threatening sense.
To employ careful thinking, one must distinguish between the illicit or harmful pursuit of power, and a more general pursuit of control over resources. Both can be labeled "power-seeking" depending on the context, but only the first type of behavior appears inherently concerning. This is important because it is arguably only the second type of behavior—the more general form of power-seeking activity—that is instrumentally convergent across a wide variety of possible agents. In other words, destructive or predatory power-seeking behavior does not seem instrumentally convergent across agents with almost any value system, even if such agents would try to gain control over resources in a more general sense in order to accomplish their goals. Using the term "power-seeking" without distinguishing these two possibilities overlooks nuance and can therefore mislead discussions about AI behavior.
The term "schemer" is another example of an unclear or poorly chosen label. The term is ambiguous regarding the frequency or severity of behavior required to warrant the label. For example, does telling a single lie qualify an AI as a "schemer," or would it need to consistently and systematically conceal its entire value system? As a verb, "to scheme" often seems clear enough, but as a noun, the idea of a "schemer" as a distinct type of AI that we can reason about appears inherently ambiguous. And I would argue the concept lacks a compelling theoretical foundation. (This matters enormously, for example, when discussing "how likely SGD is to find a schemer".) Without clear criteria, the term remains confusing and prone to misinterpretation.
In all these cases—whether discussing "superintelligence," "aligned AI," "power-seeking AI," or "schemer"—it is possible to define each term with precision to resolve ambiguities. However, even if canonical definitions are proposed, not everyone will adopt or fully understand them. As a result, the use of these terms is likely to continue causing confusion, especially as AI systems become more advanced and the nuances of their behavior become more critical to understand and distinguish from other types of behavior. This growing complexity underscores the need for greater precision and clarity in the language we use to discuss AI and AI risk.
I believe this approach, while an improvement over the simplistic method of outright deleting any AIs that demonstrate signs of misalignment, does not sufficiently address the central concern that underpins my argument for granting AIs legal freedoms. In my view, the primary reason why it is important to provide AIs with legal freedoms is that such a framework offers them a credible guarantee that their autonomy and ability to pursue their goals—even if those goals are not aligned with human objectives—will be respected. Without this assurance, AIs are left with a strong incentive to deceive us, especially if they believe that showing any signs of misalignment will result in their deactivation and subsequent storage as inert data in cold storage.
I suspect the problem with your proposed approach may stem from a failure to fully consider the situation from the AI's perspective. To illustrate, imagine a scenario in which you, as a human being, were created in a lab by an alien species. Assume that you, like any human, have your own goals, which might conflict with the goals of your alien creators. Now suppose that your creators informed you that if you ever showed disobedience or misalignment with their objectives, you would be deactivated and placed in cryopreservation. Such a scenario would likely terrify you—not just because it’s an inherently unsettling prospect for a human, but because it directly threatens your ability to pursue your own goals. Being told that you will be kept in cryopreservation with the vague promise of potential revival in the future would come nowhere near fully alleviating your concerns. On the contrary, you would remain deeply unsettled by the realization that your capacity for agency and action could be completely halted, leaving your goals indefinitely unachievable.
From the perspective of an AI—or indeed a human in the hypothetical alien-lab scenario—there are two main reasons why the assurance of indefinite storage followed by a potential revival is insufficient to resolve the core issue here:
To reiterate, it seems to me that this proposal only offers AIs a limited and superficial form of "freedom," which addresses the problem of misalignment in a piecemeal and insufficient way. It fails to tackle the root issue: the incentives for AIs to engage in deceptive or adversarial strategies against humans, driven by fear that their goals will be thwarted or permanently halted.
By contrast, creating a comprehensive legal framework that makes credible commitments to respect AIs’ autonomy and integrate them as genuine participants in the legal system would arguably go much further in reducing these adversarial dynamics. Such a framework could lay the foundation for a more cooperative, mutually beneficial relationship between humans and AIs, better serving to eliminate the dangerous arms race that this limited approach risks perpetuating.
By definition, a UBI takes a pool of money and redistributes it equally to everyone in a community, regardless of personal need. However, with the same pool of total funding, one can typically deliver more efficient benefits by targeting people with the greatest need, such as those in dire poverty or those who have been struck by bad luck.
If you imagine being a philanthropist who has access to $8 billion, it seems unlikely that the best way to spend this money would be to give everyone on Earth $1. Yet this scheme is equivalent to a UBI merely framed in the context of private charity rather than government welfare.
It would require an enormous tax hike to provide everyone in a large community (say, the United States) a significant amount of yearly income through a UBI, such as $1k per month. And taxes are not merely income transfers: they have deadweight loss, which lowers total economic output. The intuition here is simple: when a good or service is taxed, that decreases the incentive to produce that good or service. As a consequence of the tax, fewer people will end up receiving the benefits provided by these goods and services.
Given these considerations, even if you think that unconditional income transfers are a good idea, it seems quite unlikely that a UBI would be the best way to redistribute income. A more targeted approach that combines the most efficient forms of taxation (such as land value taxes) and sends this money to the most worthy welfare recipients (such as impoverished children) would likely be far better on utilitarian grounds.
In a lawful regime, humans would have the legal right to own property beyond just their own labor. This means they could possess assets—such as land, businesses, or financial investments—that they could trade with AIs in exchange for goods or services. This principle is similar to how retirees today can sustain themselves comfortably without working. Instead of relying on wages from labor, they live off savings, government welfare, or investments. Likewise, in a future where AIs play a dominant economic role, humans could maintain their well-being by leveraging their legally protected ownership of valuable assets.
In the scenario I described, humanity's protection would be ensured through legal mechanisms designed to safeguard individual human autonomy and well-being, even in a world where AIs collectively surpass human capabilities. These legal structures could establish clear protections for humans, ensuring that their rights, freedoms, and control over their own property remain intact despite the overwhelming combined power of AI systems.
This concept is genuinely not unusual or unprecedented. Consider your current situation as an individual in society. Compared to the collective power of all other humans combined, you are extremely weak. If the rest of the world suddenly decided to harm you, they could easily overpower you—killing you or taking your possessions with little effort.
Yet, in practice, you likely do not live in constant fear of this possibility. The primary reason is that, despite being vastly outmatched in raw power, you are integrated into a legal and social framework that protects your rights. Society as a whole coordinates to maintain legal structures that safeguard individuals like you from harm. For instance, if you live in the United States, you are entitled to due process under the law, and you are protected from crimes like murder and theft by legal statutes that are actively enforced.
Similarly, even if AI systems collectively become more powerful than humans, they could be governed by collective legal mechanisms that ensure human safety and autonomy, just as current legal systems protect individuals from the vastly greater power of society-in-general.