On the limits of idealized values

Joe_Carlsmith

On a popular view about meta-ethics, what you should value is determined by what an idealized version of you would value. Call this view “idealizing subjectivism.”

Idealizing subjectivism has been something like my best-guess meta-ethics. And lots of people I know take it for granted. But I also feel nagged by various problems with it — in particular, problems related to (a) circularity, (b) indeterminacy, and (c) “passivity.” This post reflects on such problems.

My current overall take is that especially absent certain strong empirical assumptions, idealizing subjectivism is ill-suited to the role some hope it can play: namely, providing a privileged and authoritative (even if subjective) standard of value. Rather, the version of the view I favor mostly reduces to the following (mundane) observations:

If you already value X, it’s possible to make instrumental mistakes relative to X.
You can choose to treat the outputs of various processes, and the attitudes of various hypothetical beings, as authoritative to different degrees.

This isn’t necessarily a problem. To me, though, it speaks against treating your “idealized values” the way a robust meta-ethical realist treats the “true values.” That is, you cannot forever aim to approximate the self you “would become”; you must actively create yourself, often in the here and now. Just as the world can’t tell you what to value, neither can your various hypothetical selves — unless you choose to let them. Ultimately, it’s on you.

I. Clarifying the view

Let’s define the view I have in mind a little more precisely:

Idealizing subjectivism: X is intrinsically valuable, relative to an agent A, if and only if, and because, A would have some set of evaluative attitudes towards X, if A had undergone some sort of idealization procedure.

By evaluative attitudes, I mean things like judgments, endorsements, commitments, cares, desires, intentions, plans, and so on. Different versions of the view can differ in which they focus on.

Example types of idealization might include: full access to all relevant information; vivid imaginative acquaintance with the relevant facts; the limiting culmination of some sort of process of reflection, argument, and/or negotiation/voting/betting between representatives of different perspectives; the elimination of “biases”; the elimination of evaluative attitudes that you don’t endorse or desire; arbitrary degrees of intelligence, will-power, dispassion, empathy, and other desired traits; consistency; coherence; and so on.

Note that the “and because” in the definition is essential. Without it, we can imagine paradigmatically non-subjectivist views that qualify. For example, it could be that the idealization procedure necessarily results in A’s recognizing X’s objective, mind-independent value, because X’s value is one of the facts that falls under “full information.” Idealizing subjectivism explicitly denies this sort of picture: the point is that A’s idealized attitudes make X valuable, relative to A. (That said, views on which all idealized agents converge in their evaluative attitudes can satisfy the definition above, provided that value is explained by the idealized attitudes in question, rather than vice versa.)

“Relative to an agent A,” here, means something like “generating (intrinsic) practical reasons for A.”

II. The appeal

Why might one be attracted to such a view? Part of the appeal, I think, comes from resonance with three philosophical impulses:

A rejection of certain types of robust realism about value, on which value is just a brute feature of the world “out there.”
A related embrace of a kind of Humeanism about means and ends. The world can tell you the means to your ends, but it cannot tell you what ends to pursue — those must in some sense be there already, in your (idealized?) heart.
An aspiration to maintain some kind of deep connection between what’s valuable, and what actually moves us to act (though note that this connection is not universalized — e.g., what’s valuable relative to you may not be motivating to others).

Beyond this, though, a key aim of idealizing subjectivism (at least for me) is to capture the sense in which it’s possible to question what you should value, and to make mistakes in your answer. That is, the idealization procedure creates some distance between your current evaluative attitudes, and the truth about what’s valuable (relative to you). Things like “I want X,” or “I believe that X is valuable” don’t just settle the question.

This seems attractive in cases like:

(Factual mistake) Alfred wants his new “puppy” Doggo to be happy. Doggo, though, is really a simple, non-conscious robot created by mischievous aliens. If Alfred knew Doggo’s true nature, he would cease to care about Doggo in this way.
(Self knowledge) Betty currently feels very passionately about X cause. If she knew, though, that her feelings were really the product of a desire to impress her friend Beatrice, and to fit in with her peers more broadly, she’d reject them. (This example is inspired by one from Yudkowsky here.)
(Philosophical argument) Cindy currently thinks of herself as an average utilitarian, and she goes around trying to increase average utility. However, if she learned more about the counterintuitive implications to average utilitarianism, she would switch to trying to increase total utility instead.
(Vividness) Denny knows that donating $10,000 to the Against Malaria Foundation, instead of buying a new grand piano, would save multiple lives in expectation. He’s currently inclined to buy the grand piano. However, if he imagined more vividly what it means to save these lives, and/or if he actually witnessed the impact that saving these lives would have, he’d want to donate instead.
(Weakness of the will) Ernesto is trapped by a boulder, and he needs to cut off his own hand to get free, or he’ll die. He really doesn’t want to cut off his hand. However, he would want himself to cut off the hand, if he could step back and reflect dispassionately.
(Incoherence) Francene prefers vacationing in New York to San Francisco, San Francisco to LA, and LA to New York, and she pays money to trade “vacation tickets” in a manner that reflects these preferences. However, if she reflected more on her vulnerability to losses this way, she’d resolve her circular preferences into New York > SF > LA.
(Inconsistency) Giovanni’s intuitions are (a) it’s impermissible to let a child drown in order to save an expensive suit, (b) it’s permissible to buy a suit instead of donating the money to save a distant child, and (c) there’s no morally relevant difference between these cases. If he had to give one of these up, he’d give up (c).
(Vicious desires) Harriet feels a sadistic desire for her co-worker to fail and suffer, but she wishes that she didn’t feel this desire.

By appealing to the hypothetical attitudes of these agents, the idealizing subjectivist aims to capture a sense that their actual attitudes are, or at least could be, in error.

Finally, idealizing subjectivism seems to fit with of our actual practices of ethical reflection. For example, thinking about value, we often ask questions like: “what would I think/feel if I understood this situation better?”, “what would I think if I weren’t blinded by X emotion or bias?” and so forth — questions reminiscent of idealization. And ethical debate often involves seeking a kind of reflective equilibrium — a state that some idealizers take as determining what’s valuable, rather than indicating it.

These, then, are among the draws of idealizing subjectivism (there are others) — though note that whether the view can actually deliver these goods (anti-realism, Humeanism, fit with our practices, etc) is a further question, which I won’t spend much time on.

What about objections? One common objection is that the view yields counterintuitive results. Plausibly, for example, we can imagine ideally-coherent suffering maximizers, brick-eaters, agents who are indifferent towards future agony, agents who don’t care about what happens on future Tuesdays, and so on — agents whose pursuit of their values, it seems, need involve no mistakes (relative to them). We can debate which of such cases the idealized subjectivist must concede, but pretty clearly: some. In a sense, cases like this lie at the very surface of the view. They’re the immediate implications.

(As I’ve discussed previously, we can also do various semantic dances, here, to avoid saying certain relativism-flavored things. For example, we can make “a paperclip maximizer shouldn’t clip” true in a hedonist’s mouth, or a paperclip maximizer’s statement “I should clip” false, evaluated by a hedonist. Ultimately, though, these moves don’t seem to me to change the basic picture much.)

My interest here is in a different class of more theoretical objections. I wrote about one of these in my post about moral authority. This post examines some others. (Many of them, as well as many of the examples I use throughout the post, can be found elsewhere in the literature in some form or other.)

III. Which idealization?

Consider Clippy, the paperclip maximizing robot. On a certain way of imagining Clippy, its utility function is fixed and specifiable independent of its behavior, including behavior under “idealized conditions.” Perhaps we imagine that there is a “utility function slot” inside of Clippy’s architecture, in which the programmers have written “maximize paperclips!” — and it is in virtue of possessing this utility function that Clippy consistently chooses more paperclips, given idealized information. That is, Clippy’s behavior reveals Clippy’s values, but it does not constitute those values. The values are identifiable by other means (e.g., reading what’s written in the utility function slot).

If your values are identifiable by means other than your behavior, and if they are already coherent, then it’s much easier to distinguish between candidate idealization procedures that preserve your values vs. changing them. Holding fixed the content of Clippy’s “utility function slot,” for example, we can scale up Clippy’s knowledge, intelligence, etc, while making sure that the resulting, more sophisticated agent is also a paperclip maximizer.

But note, though, that in such a case, appeals to idealization also don’t seem to do very much useful normative work, for subjectivists. To explain what’s of value relative to this sort of Clippy, that is, we can just look directly at Clippy’s utility function. If humans were like this, we could just look at a human’s “utility function slot,” too. No fancy idealization necessary.

But humans aren’t like this. We don’t have a “utility function slot” (or at least, I’ll assume as much in what follows; perhaps this — more charitably presented — is indeed an important point of dispute). Rather, our beliefs, values, heuristics, cognitive procedures, and so on are, generally speaking, a jumbled, interconnected mess (here I think of a friend’s characterization, expressed with a tinge of disappointment and horror: “an unholy and indeterminate brew of these … sentiments”). The point of idealizing subjectivism is to take this jumbled mess as an input to an idealization procedure, and then to output something that plays the role of Clippy’s utility function — something that will constitute, rather than reveal, what’s of value relative to us.

In specifying this idealization procedure, then, we don’t have the benefit of holding fixed the content of some slot, or of specifying that the idealization procedure can’t “change your values.” Your values (or at least, the values we care about not changing) just are whatever comes out the other side of the idealization procedure.

Nor, importantly, can we specify the idealization procedure via reference to some independent truth that its output needs to track. True, we evaluate the “ideal-ness” of other, more epistemic procedures this way (e.g., the ideal judge of the time is the person whose judgment actually tracks what time it is — see Enoch (2005)). But the point of idealizing subjectivism is that there is no such independent truth available.

Clearly, though, not just any idealization procedure will do. Head bonkings, brainwashings, neural re-wirings — starting with your current brain, we can refashion you into a suffering-maximizer, a brick-eater, a helium-maximizer, you name it. So how are we to distinguish between the “ideal” procedures, and the rest?

IV. Galaxy Joe

To me, this question gains extra force from the fact that your idealized self, at least as standardly specified, will likely be a quite alien creature. Consider, for example, the criterion, endorsed in some form by basically every version of idealization subjectivism, that your idealized self possess “full information” (or at least, full relevant information — but what determines relevance?). This criterion is often treated casually, as though a run-of-the-mill human could feasibly satisfy it with fairly low-key modifications. But my best guess is that to the extent that possessing “full information” is a thing at all, the actual creature to imagine is more like a kind of God — a being (or perhaps, a collection of beings) with memory capacity, representational capacity, and so on vastly exceeding that of any human. To evoke this alien-ness concretely, let’s imagine a being with a computationally optimal brain the size of a galaxy. Call this a “galaxy Joe.”

Here, we might worry that no such galaxy Joe could be “me.” But it’s not clear why this would matter, to idealizing subjectivists: what’s valuable, relative to Joe, could be grounded in the evaluative attitudes of galaxy Joe, even absent a personal identity relation between them. The important relation, for example, be some form of psychological continuity (though I’ll continue to use the language of self-hood in what follows).

Whether me or not, though: galaxy Joe seems like he’ll likely be, from my perspective, a crazy dude. It will be hard/impossible to understand him, and his evaluative attitudes. He’ll use concepts I can’t represent. His ways won’t be my ways.

Suppose, for example, that a candidate galaxy Joe — a version of myself created by giving original me “full information” via some procedure involving significant cognitive enhancement — shows me his ideal world. It is filled with enormously complex patterns of light ricocheting off of intricate, nano-scale, mirror-like machines that appear to be in some strange sense “flowing.” These, he tells me, are computing something he calls [incomprehensible galaxy Joe concept (IGJC) #4], in a format known as [IGJC #5], undergirded and “hedged” via [IGJC #6]. He acknowledges that he can’t explain the appeal of this to me in my current state.

“I guess you could say it’s kind of like happiness,” he says, warily. He mentions an analogy with abstract jazz.

“Is it conscious?” I ask.

“Um, I think the closest short answer is ‘no,’” he says.

Of course, by hypothesis, I would become him, and hence value what he values, if I went through the procedure that created him — one that apparently yields full information. But now the question of whether this is a procedure I “trust,” or not, looms large. Has galaxy Joe gone off the rails, relative to me? Or is he seeing something incredibly precious and important, relative to me, that I cannot?

The stakes are high. Suppose I can create either this galaxy Joe’s favorite world, or a world of happy puppies frolicking in the grass. The puppies, from my perspective, are a pretty safe bet: I myself can see the appeal. Expected value calculations under moral uncertainty aside, suppose I start to feel drawn towards the puppies. Galaxy Joe tells me with grave seriousness: “Creating those puppies instead of IGJC #4 would be a mistake of truly ridiculous severity.” I hesitate. Is he right, relative to me? Or is he basically, at this point, an alien, a paperclip maximizer, for all his humble roots in my own psychology?

Is there an answer?

V. Mind-hacking vs. insight

Here’s a related intuition pump. Just as pills and bonks on the head can change your evaluative attitudes, some epistemically-flavored stimuli can do so, too. Some such changes we think of as “legitimate persuasion” or “value formation,” others we think of as being “brainwashed,” “mind-hacked,” “reprogrammed,” “misled by rhetoric and emotional appeals,” and so on. How do we tell (or define) the difference?

Where there are independent standards of truth, we can try appealing to them. E.g., if Bob, a fiery orator, convinces you that two plus two is five, you’ve gone astray (though even cases like this can get tricky). But in the realm of pure values, and especially absent other flagrant reasoning failures, it gets harder to say.

One criterion might be: if the persuasion process would’ve worked independent of its content, this counts against its legitimacy (thanks to Carl Shulman for discussion). If, for example, Bob, or exposure to a certain complex pattern of pixels, can convince you of anything, this might seem a dubious source of influence. That said, note that certain common processes of value formation — for example, attachment to your hometown, or your family — are “content agnostic” to some extent (e.g., you would’ve attached to a different hometown, or a different family, given a different upbringing); and ultimately, different evolutions could’ve built wildly varying creatures. And note, too, that some standard rationales for such a criterion — e.g., being convinced by Bob/the pixels doesn’t correlate sufficiently reliably with the truth — aren’t in play here, since there’s no independent truth available.

Regardless, though, this criterion isn’t broad enough. In particular, some “mind-hacking” memes might work because of their content — you can’t just substitute in arbitrary alternative messages. Indeed: one wonders, and worries, about what sort of Eldritch horrors might be lurking in the memespace, ready and able, by virtue of their content, to reprogram and parasitize those so foolish, and incautious, as to attempt some sort of naive acquisition of “full information.”

To take a mundane example: suppose that reading a certain novel regularly convinces people to become egoists, and you learn, to your dismay (you think of yourself as an altruist), that it would convince you to become so, too, if you read it. Does your “idealization procedure” involve reading it? You’re not used to avoiding books, and this one contains, let’s suppose, no falsehoods or direct logical errors. Still, on one view, the book is, basically, brainwashing. On another, the book is a window onto a new and legitimately more compelling vision of life. By hypothesis, you’d take the latter view after reading. But what’s the true view?

Or suppose that people who spend time in bliss-inducing experience machines regularly come to view time spent in such machines as the highest good, because their brains receive such strong reward signals from the process, though not in a way different in kind from other positive experiences like travel, fine cuisine, romantic love, and so on (thanks to Carl Shulman for suggesting this example). You learn that you, too, would come to view machine experiences this way, given exposure to them, despite the fact that you currently give priority to non-hedonic goods. Does your idealization process involve entering such machines? Would doing so result in a “distortion,” an (endorsed, desired) “addiction”; or would it show you something you’re currently missing — namely, just how intrinsically good, relative to you, these experiences really are?

Is there an answer?

As with the candidate galaxy Joe above, what’s needed here is some way of determining which idealization procedures are, as it were, the real deal, and which create imposters, dupes, aliens; which brain-wash, alter, or mislead. I’ll consider three options for specifying the procedure in question, namely:

Without reference to your attitudes/practices.
By appeal to your actual attitudes/practices.
By appeal to your idealized attitudes/practices.

All of these, I think, have problems.

VI. Privileged procedures

Is there some privileged procedure for idealizing someone, that we can specify and justify without reference to that person’s attitudes (actual or ideal)? To me, the idea of giving someone “full information” (including logical information), or of putting them in a position of “really understanding” (assuming, perhaps wrongly, that we can define this in fully non-evaluative terms) is the most compelling candidate. Indeed, when I ask myself whether, for example, IGJC #4 is really good (relative to me), I find myself tempted to ask: “how would I feel about it, if I really understood it?”. And the question feels like it has an answer.

One justification for appealing to something like “full information” or “really understanding” is: it enables your idealized self to avoid instrumental mistakes. Consider Alfred, owner of Doggo above. Because Alfred doesn’t know Doggo’s true nature (e.g., a simple, non-conscious robot), Alfred doesn’t know what he’s really causing, when he e.g. takes Doggo to the park. He thinks he’s causing a conscious puppy to be happy, but he’s not. Idealized Alfred knows better. Various other cases sometimes mentioned in support of idealizing — e.g., someone who drinks a glass of petrol, thinking it was gin — can also be given fairly straightforward instrumental readings.

But this justification seems too narrow. In particular: idealizers generally want the idealization process to do more than help you avoid straightforward instrumental mistakes. In cases 1-8 above, for example, Alfred’s is basically the only one that fits this instrumental mold straightforwardly. The rest involve something more complex — some dance of “rewinding” psychological processes (see more description here), rejecting terminal (or putatively terminal) values on the basis of their psychological origins, and resolving internal conflicts by privileging some evaluative attitudes, stances, and intuitions over others. That is, the idealization procedure, standardly imagined, is supposed to do more than take in someone who already has and is pursuing coherent values, and tell them how to get what they want; that part is (theoretically) easy. Rather, it’s supposed to take in an actual, messy, internally conflicted human, and output coherent values — values that are in some sense “the right answer” relative to the human in question.

Indeed, I sometimes wonder whether the appeal of idealizing subjectivism rests too much on people mistaking its initial presentation for the more familiar procedure of eliminating straightforward instrumental mistakes. In my view, if we’re in a theoretical position to just get rid of instrumental mistakes, then we’re already cooking with gas, values-wise. But the main game is messier — e.g., using hypothetical selves (which?) to determine what counts as an instrumental mistake, relative to you.

There’s another, subtly different justification for privileging “full information,” though: namely, that once you’ve got full information, then (assuming anti-realism about values) you’ve got everything that the world can give you. That is: there’s nothing about reality that you’re, as it were, “missing” — no sense in which you should hesitate from decision, on the grounds that you might learn something new, or be wrong about some independent truth. The rest, at that point, is up to you.

I’m sympathetic to this sort of thought. But I also have a number of worries about it.

One (fairly minor) is whether it justifies baking full information into the idealization procedure, regardless of the person’s attitudes towards acquiring such information. Consider someone with very limited interest in the truth, and whose decision-making process, given suitable opportunity, robustly involves actively and intentionally self-modifying to close off inquiry and lock in various self-deceptions/falsehoods. Should we still “force” this person’s idealized self to get the whole picture before resolving questions like whether to self-deceive?

A second worry, gestured at above, is that the move from my mundane self to a being with “full information” is actually some kind of wild and alien leap: a move not from Joe to “Joe who has gotten out a bit more, space and time-wise” but from Joe to galaxy Joe, from Joe to a kind of God. And this prompts concern about the validity of the exercise.

Consider its application to a dog, or an ant. What would an ant value, if it had “full information”? What, for that matter, would a rock value, if it had full information? If I were a river, would I flow fast, or slow? If I were an egg, would I be rotten? Starting with a dog, or an ant, or a rock, we can create a galaxy-brained God. Or, with the magic of unmoored counterfactuals, we can “cut straight to” some galaxy-brained God or other, via appeal to some hazy sort of “similarity” to the dog/ant/rock in question, without specifying a process for getting there — just as we can try to pick an egg that I would be, if I were an egg. With dogs, or ants, though, and certainly with rocks, it seems strange to give the resulting galaxy-brain much authority, with respect to what the relevant starting creature/rock “truly values,” or should. In deciding whether to euthanize your dog Fido, should you ask the nearest galaxy-brained former-Fido? If not, are humans different? What makes them so?

This isn’t really a precise objection; it’s more of a hazy sense that if we just ask directly “how would I feel about X, if I were a galaxy brain?”, we’re on shaky ground. (Remember, we can’t specify my values independently, hold them fixed, and then require that the galaxy brain share them; the whole point is that the galaxy brain’s attitudes constitute my values.)

A third worry is about indeterminacy. Of the many candidate ways of creating a fully informed galaxy Joe, starting with actual me, it seems possible that there will be important path-dependencies (this possibility is acknowledged by many idealizers). If you learn X information, or read Y novel, or have Z experience, before some alternatives (by hypothesis, you do all of it eventually), you will arrive at a very different evaluative endpoint than if the order was reversed. Certainly, much real-life value formation has this contingent character: you meet Suzy, who loves the stoics, is into crypto, and is about to start a medical residency, so you move to Delaware with her, read Seneca, start hanging out with libertarians, and so on. Perhaps such contingency persists in more idealized cases, too. And if we try to skip over process and “cut straight to” a galaxy Joe, we might worry, still, that equally qualified candidates will value very different things: “full information” just isn’t enough of a constraint.

(More exotically, we might also worry that amongst all the evaluative Eldritch horrors lurking in the memespace, there is one that always takes over all of the Joes on their way to becoming fully-informed galaxy Joes, no matter what they do to try to avoid it, but which is still in some sense “wrong.” Or that full information, more generally, always involves memetic hazards that are fatal from an evaluative perspective. It’s not clear that idealizing subjectivism has the resources to accommodate distinctions between such hazards and the evaluative truth. That said, these hypotheses also seem somewhat anti-Humean in flavor. E.g., can’t fully-informed minds value any old thing?)

Worries about indeterminacy become more pressing once we recognize all the decisions a galaxy Joe is going to have to make, and all of the internal evaluative conflicts he will have to resolve (between object-level and meta preferences, competing desires, contradictory intuitions, and the like), that access to “full information” doesn’t seem to resolve for him. Indeed, the Humean should’ve been pessimistic about the helpfulness of “full information” in this regard from the start. If, by Humean hypothesis, your current, imperfect knowledge of the world can’t tell you what to want for its own sake, and/or how to resolve conflicts between different intrinsic values, then perfect knowledge won’t help, either: you still face what is basically the same old game, with the same old gap between is and ought, fact and value.

Beyond accessing “full information,” is there a privileged procedure for playing this game, specifiable without reference to the agent’s actual or idealized attitudes? Consider, for example, the idea of “reflective equilibrium” in ethics — the hypothesized, stable end-state of a process of balancing more specific intuitions with more general principles and theoretical considerations. How, exactly, is this balance to be struck? What weight, for example, should be given to theoretical simplicity and elegance, vs. fidelity to intuition and common sense? In contexts with independent standards of accuracy, we might respond to questions like this with reference to the balance most likely to yield the right answer; but for the idealizer, there is not yet a right answer to be sought; rather, the reflective equilibrium process makes its output right. But which reflective equilibrium process?

Perhaps we might answer: whatever reflective equilibrium process actually works in the cases where there is a right answer (thanks to Nick Beckstead for discussion). That is, you should import the reasoning standards you can actually evaluate for accuracy (for example, the ones that work in e.g. physics, math, statistics, and so on) into a domain (value) with no independent truth. Thus, for example, if simplicity is a virtue in science, because (let’s assume) the truth is often simple, it should be a virtue in ethics, too. But why? Why not do whatever’s accurate in the case where accuracy is a thing, and then something else entirely in the domain where you can’t go wrong, except relative to your own standards?

(We can answer, here, by appeal to your actual or idealized attitudes: e.g., you just do, in fact, use such-and-such standards in the evaluative domain, or would if suitably idealized. I discuss these options in the next sections. For now, the question is whether we can justify particular idealization procedures absent such appeals.)

Or consider the idea that idealization involves or is approximated by “running a large number of copies of yourself, who then talk/argue a lot with each other and with others, have a bunch of markets, and engage in lots of voting and trading and betting” (see e.g. Luke Muelhauser’s description here), or that it involves some kind of “moral parliament.” What sorts of norms, institutions, and procedures structure this process? How does it actually work? Advocates of these procedures rarely say in any detail (though see here for one recent discussion); but presumably, one assumes, “the best procedures, markets, voting norms, etc.” But is there a privileged “best,” specifiable and justifiable without appeal to the agent’s actual/idealized attitudes? Perhaps we hope that the optimal procedures are just there, shining in their optimality, identifiable without any object-level evaluative commitments (here Hume and others say: what?), or more likely, given any such commitments. My guess, though, is that absent substantive, value-laden assumptions about veils of ignorance and the like, and perhaps even given such assumptions, this hope is over-optimistic.

The broader worry, here, is that once we move past “full information,” and start specifying the idealization procedure in more detail (e.g., some particular starting state, some particular type of reflective equilibrium, some particular type of parliament), or positing specific traits that the idealized self needs to have (vivid imagination, empathy, dispassion, lack of “bias,” etc), our choice of idealization will involve (or sneak in) object-level value judgments that we won’t be able to justify as privileged without additional appeal to the agent’s (actual or idealized) attitudes. Why vivid imagination, or empathy (to the extent they add anything on top of “full information”)? Why a cool hour, instead of a hot one? What counts as an evaluative bias, if there is no independent evaluative truth? The world, the facts, don’t answer these questions.

If we can’t appeal to the world to identify a privileged idealization procedure, it seems we must look to the agent instead. Let’s turn to that option now.

VII. Appeals to actual attitudes

Suppose we appeal to your actual attitudes about idealization procedures, in fixing the procedure that determines what’s of value relative to you. Thus, if we ask: why this particular reflective equilibrium? We answer: because that’s the version you in fact use/endorse. Why this type of parliament, these voting norms? They’re the ones you in fact favor. Why empathy, or vivid imagination, or a cool hour? Because you like them, prefer them, trust them. And so on.

Indeed, some idealization procedures make very explicit reference to the “idealized you” that you yourself want to be/become. In cases like “vicious desires” above, for example, your wanting not to have a particular desire might make it the case that “idealized you” doesn’t have it. Similarly, Yudkowsky’s “coherent extrapolated volition” appeals to the attitudes you would have if you were “more the person you wished you were.”

At a glance, this seems an attractive response, and one resonant with a broader subjectivist vibe. However, it also faces a number of problems.

First: just as actual you might be internally conflicted about your object-level values (conflicts we hoped the idealization procedure would resolve), so too might actual you be internally conflicted about the procedural values bearing on the choice of idealization procedure. Perhaps, for example, there isn’t currently a single form of reflective equilibrium that you endorse, treat as authoritative, etc; perhaps there isn’t a single idealized self that you “wish you were,” a single set of desires you “wish you had.” Rather, you’re torn, at a meta-level, about the idealization procedures you want to govern you. If so, there is some temptation, on pain of indeterminacy, to look to an idealization procedure to resolve this meta-conflict, too; but what type of idealization procedure to use is precisely what you’re conflicted about (compare: telling a group torn about the best voting procedure to “vote on it using the best procedure”).

Indeed, it can feel like proponents of this version of the view hope, or assume, that you are in some sense already engaged in, or committed to, a determinate decision-making process of forming/scrutinizing/altering your values, which therefore need only be “run” or “executed.” Uncertainty about your values, on this picture, is just logical uncertainty about what the “figure out my values computation” you are already running will output. The plan is in place. Idealization executes.

But is this right? Clearly, most people don’t have very explicit plans in this vein. At best, then, such plans must be implicit in their tangle of cognitive algorithms. Of course, it’s true that if put in different fully-specified situations, given different reflective resources, and forced to make different choices given different constraints, there is in fact a thing a given person would do. But construing these choices as the implementation of a determinate plan/decision-procedure (as opposed to e.g., noise, mistakes, etc), to be extrapolated into some idealized limit, is, at the least, a very substantive interpretative step, and questions about indeterminacy and path dependence loom large. Perhaps, for example, what sort of moral parliament Bob decides to set up, in different situations, depends on the weather, or on what he had for breakfast, or on which books he read in what order, and so on. And perhaps, if we ask him which such situation he meta-endorses as most representative of his plan for figuring out his values, he’ll again give different answers, given different weather, breakfasts, books, etc — and so on.

(Perhaps we can just hope that this bottoms out, or converges, or yields patterns/forms of consensus robust enough to interpret and act on; or perhaps, faced with such indeterminacy, we can just say: “meh.” I discuss responses in this vein in section IX.)

Second (though maybe minor/surmountable): even if your actual attitudes yield determinate verdicts about the authoritative form of idealization, it seems like we’re now giving your procedural/meta evaluative attitudes an unjustified amount of authority relative to your more object-level evaluative attitudes. That is, we’re first using your procedural/meta evaluative attitudes to fix an idealization procedure, then judging the rest of your attitudes via reference to that procedure. But why do the procedural/meta attitudes get such a priority?

This sort of issue is most salient in the context of cases like the “vicious desires” one above. E.g., if you have (a) an object-level desire that your co-worker suffer, and (b) a meta-desire not to have that object-level desire, why do we choose an “ideal you” in which the former is extinguished, and the latter triumphant? Both, after all, are just desires. What grants meta-ness such pride of place?

Similarly, suppose that your meta-preferences about idealization give a lot of weight to consistency/coherence — but that consistency/coherence will require rejecting some of your many conflicting object-level desires/intuitions. Why, then, should we treat consistency/coherence as a hard constraint on “ideal you,” capable of “eliminating” other values whole hog, as opposed to just one among many other values swirling in the mix?

(Not all idealizers treat consistency/coherence in this way; but my sense is that many do. And I do actually think there’s more to say about why consistency/coherence should get pride of place, though I won’t try to do so here.)

Third: fixing the idealization procedure via reference to your actual (as opposed to your idealized) evaluative attitudes risks closing off the possibility of making mistakes about the idealization procedure you want to govern you. That is, this route can end up treating your preferences about idealization as “infallible”: they fix the procedure that stands in judgment over the rest of your attitudes, but they themselves cannot be judged. No one watches the watchmen.

One might have hoped, though, to be able to evaluate/criticize one’s currently preferred idealization procedures, too. And one might’ve thought the possibility of such criticism truer to our actual patterns of uncertainty and self-scrutiny. Thus: if you currently endorse reflective equilibrium process X, but you learn that it implies an idealized you that gives up currently cherished value Y, you may not simply say: “well, that’s the reflective equilibrium process I endorse, so there you have it: begone, Y.” Rather, you can question reflective equilibrium process X on the very grounds that it results in giving up cherished value Y — that is, you can engage in kind of meta-reflective equilibrium, in which the authority of a given process of reflective equilibrium is itself subject to scrutiny from the standpoint of the rest of what you care about.

Indeed, if I was setting off on some process of creating my own “moral parliament,” or of modifying myself in some way, then even granted access to “full information,” I can well imagine worrying that the parliament/self I’m creating is of the wrong form, and that the path I’m on is the wrong one. (This despite the fact that I can accurately forecast its results before going forward — just as I can accurately forecast that, after reading the egoist novel, or entering the experience machine, I’ll come out with a certain view on the other end. Such forecasts don’t settle the question).

We think of others as making idealization procedure mistakes, too. Note, for example, the tension between appealing to your actual attitudes towards idealization, and the (basically universal?) requirement that the idealized self possess something like full (or at least, much more) information. Certain people, for example, might well endorse idealization processes that lock in certain values and beliefs very early, and that as a result never reach any kind of fully informed state: rather, they arrive at a stable, permanently ignorant/deceived equilibrium well before that. Similarly, certain people’s preferred idealization procedures might well lead them directly into the maw of some memetic hazard or other (“sure, I’m happy to look at the whirling pixels”).

Perhaps we hope to save such people, and ourselves, from such (grim? ideal?) fates. We find ourselves saying: “but you wouldn’t want to use that idealization procedure, if you were more idealized!”. Let’s turn to this kind of thought, now.

VIII. Appeals to idealized attitudes

Faced with these problems with fixing the idealization procedure via reference to our actual evaluative attitudes, suppose we choose instead to appeal to our idealized evaluative attitudes. Naive versions of this, though, are clearly and problematically circular. What idealization determines what’s of value? Well, the idealization you would decide on, if you were idealized. Idealized how? Idealized in the manner you would want yourself to be idealized, if you were idealized. Idealized how? And so on. (Compare: “the best voting procedure is the one that would be voted in by the best voting procedure.”)

Of course, some idealization procedures could be self-ratifying, such that if you were idealized in manner X, you would choose/desire/endorse idealization process X. But it seems too easy to satisfy this constraint: if after idealization process X, I end up with values Y, then I can easily end up endorsing idealization process X, since this process implies that pursuing Y is the thing for me to do (and I’m all about pursuing Y); and this could hold true for a very wide variety of values resulting from a very wide variety of procedures. So “value is determined by the evaluative attitudes that would result from an idealization procedure that you would choose if you underwent that very procedure” seems likely to yield wildly indeterminate results; and more importantly, its connection with what you actually care about now seems conspicuously tenuous. If I can brainwash you into becoming a paperclip maximizer, I can likely do so in a way that will cause you to treat this very process as one of “idealization” or “seeing the light.” Self-ratification is too cheap.

Is there a middle ground, here, between using actual and idealized attitudes to fix the idealization procedure? Some sort of happy mix? But which mix? Why?

In particular, in trying to find a balance between endless circles of idealization, and “idealized as you want to be, period,” I find that I run into a kind of “problem of arbitrary non-idealization,” pulling me back towards the circle thing. Thus, for example, I find that at every step in the idealization process I’m constructing, it feels possible to construct a further process to “check”/”ratify” that step, to make sure it’s not a mistake. But this further process will itself involve steps, which themselves could be mistakes, and which themselves must therefore be validated by some further process — and so on, ad infinitum. If I stop at some particular point, and say “this particular process just isn’t getting checked. This one is the bedrock,” I have some feeling of: “Why stop here? Couldn’t this one be mistaken, too? What if I wouldn’t want to use this process as bedrock, if I thought more about it?”.

Something similar holds for particular limitations on e.g. the time and other resources available. Suppose you tell me: “What’s valuable, relative to you, is just what you’d want if ten copies of you thought about it for a thousand years, without ever taking a step of reasoning that another ten copies wouldn’t endorse if they thought about that step for a thousand years, and that’s it. Done.” I feel like: why not a hundred copies? Why not a billion years? Why not more levels of meta-checking? It feels like I’m playing some kind of “name the largest number” game. It feels like I’m building around me an unending army of ethereal Joes, who can never move until all the supervisors arrive to give their underlings the go-ahead, but everyone can never arrive, because there’s always room for more.

Note that the problem here isn’t about processes you might run or compute, in the actual world, given limited resources. Nor is it about finding a process that you’d at least be happy deferring to, over your current self; a process that is at least better than salient alternatives. Nor, indeed, is the problem “how can I know with certainty that my reasoning process will lead me to the truth” (there is no independent truth, here). Rather, the problem is that I’m supposed to be specifying a fully idealized process, the output of which constitutes the evaluative truth; but for every such process, it feels like I can make a better one; any given process seems like it could rest on mistakes that a more exhaustive process would eliminate. Where does it stop?

IX. Hoping for convergence, tolerating indeterminacy

One option, here, is to hope for some sort of convergence in the limit. Perhaps, we might think, there will come a point where no amount of additional cognitive resources, levels of meta-ratification, and so on will alter the conclusion. And perhaps indeed — that would be convenient.

Of course, there would remain the question of what sort of procedure or meta-procedure to “take the limit” of. But perhaps we can pull a similar move there. Perhaps, that is, we can hope that a very wide variety of candidate procedures yield roughly similar conclusions, in the limit.

Indeed, in general, for any of these worries about indeterminacy, there is an available response to the effect that: “maybe it converges, though?” Maybe as soon as you say “what Joe would feel if he really understood,” you hone in on a population of Galaxy Joes that all possess basically the same terminal values, or on a single Galaxy Joe who provides a privileged answer. Maybe Bob’s preferences about idealization procedures are highly stable across a wide variety of initial conditions (weather, breakfasts, books, etc). Maybe it doesn’t really matter how, and in what order, you learn, read, experience, reflect: modulo obvious missteps, you end up in a similar place. Maybe indeed.

Or, if not, maybe it doesn’t matter. In general, lots of things in life, and especially in philosophy, are vague to at least some extent; arguments to the effect that “but how exactly do you define X? what about Y edge case?” are cheap, and often unproductive; and there really are bald people, despite the indeterminacy of exactly who qualifies.

What’s more, even if there is no single, privileged idealized self, picked out by a privileged idealization procedure, and even if the many possible candidates for procedures and outputs do not converge, it seems plausible that there will still be patterns and limited forms of consensus. For example, it seems unlikely that many of my possible idealized selves end up trying to maximize helium, or to eat as many bricks as they can; even if a few go one way, the preponderance may go some other way; and perhaps it’s right to view basically all of them, despite their differences, as worthy of deference from the standpoint of my actual self, in my ignorance (e.g., perhaps the world any of them would create is rightly thought better, from my perspective, than the world I would create, if I wasn’t allowed further reflection).

In this sense, the diverging attitudes of such selves may still be able to play some of the role the idealizer hopes for. That is, pouring my resources into eating bricks, torturing cats, etc really would be a mistake, for me — none of my remotely plausible idealized selves are into it — despite the fact that these selves differ in the weight they give to [incomprehensible galaxy-brained concept] vs. [another incomprehensible galaxy-brained concept]. And while processes that involve averaging between idealized selves, picking randomly amongst them, having them vote/negotiate, putting them behind veils of ignorance, etc raise questions about circularity/continuing indeterminacy, that doesn’t mean that all such processes are on equal footing (e.g., different parties can be unsure what voting procedure to use, while still being confident/unanimous in rejecting the one that causes everyone to lose horribly).

Perhaps, then, the idealizer’s response to indeterminacy — even very large amounts of it — should simply be tolerance. Indeed, there is an art, in philosophy, to not nitpicking too hard — to allowing hand-waves, and something somethings, where appropriate, in the name of actually making progress towards some kind of workable anything. Perhaps some of the worries above have fallen on the wrong side of the line. Perhaps a vague gesture, a promissory note, in the direction of something vaguely more ideal than ourselves is, at least in practical contexts (though this isn’t one), good enough; better than nothing; and better, too, than setting evaluative standards relative to our present, decidedly un-ideal selves, in our ignorance and folly.

X. Passive and active ethics

I want to close by gesturing at a certain kind of distinction — between “passive” and “active” ethics (here I’m drawing terminology and inspiration from a paper of Ruth Chang’s, though the substance may differ) — which I’ve found helpful in thinking about what to take away from the worries just discussed.

Some idealizing subjectivists seem to hope that their view can serve as a kind of low-cost, naturalism-friendly substitute for a robustly realist meta-ethic. That is, modulo certain extensional differences about e.g. ideally-coherent suffering maximizers, they basically want to talk about value in much the way realists do, and to differ, only, when pressed to explain what makes such talk true or false.

In particular, like realists, idealizers can come to see every (or almost every) choice and evaluative attitude as attempting to approximate and conform to some external standard, relative to which the choice or attitude is to be judged. Granted, the standard in question is defined by the output of the idealization procedure, instead of the robustly real values; but in either case, it’s something one wants to recognize, receive, perceive, respond to. For us non-ideal agents, the “true values” are still, effectively, “out there.” We are, in Chang’s terminology, “passive” with respect to them.

But instructively, I think, naive versions of this can end up circular. Consider the toy view that “what’s good is whatever you’d believe to be good if you had full information.” Now suppose that you get this full information, and consider the question: is pleasure good? Well, this just amounts to the question: would I think it good if I had full information? Well, here I am with full information. Ok, do I think it good? Well, it’s good if I would think it good given full information. Ok, so is it good? And so on.

Part of the lesson here is that absent fancier footwork about what evaluative belief amounts to, belief isn’t a good candidate for the evaluative attitude idealization should rest on. But consider a different version: “what you should do is whatever you would do, given full information.” Suppose that here I am with full information. I ask myself: what should I do? Well, whatever I would do, given full information. Ok, well, I’ve got that now. What would I do, in precisely this situation? Well, I’m in this situation. Ok, what would I do, if things were like this? Well, I’d try to do what I should do. And what should I do? Etc.

The point here isn’t that there’s “no way out,” in these cases: if I can get myself to believe, or to choose, then I will, by hypothesis, have believed truly, chosen rightly. Nor, indeed, need all forms of idealizing subjectivism suffer from this type of problem (we can appeal, for example, to attitudes that plausibly arise more passively and non-agentically, like desire).

Rather, what I’m trying to point at is a way that importing and taking for granted a certain kind of realist-flavored ethical psychology can result in an instructive sort of misfire. Something is missing, in these cases, that I expect the idealizing subjectivist needs. In particular: these agents, to the end, lack an affordance for a certain kind of direct, active agency — a certain kind of responsibility, and self-creation. They don’t know how to choose, fully, for themselves. Rather, even in ideal conditions, they are forever trying to approximate something else. True, on idealizing subjectivism, the thing they are trying to approximate is ultimately, themselves, in those conditions. But this is no relief: still, they are approximating an approximator, of an approximator, and so on, in an endless loop. They are always looking elsewhere, forever down the hall of mirrors, around and around a maze with no center (what’s in the center?). Their ultimate task, they think, is to obey themselves. But they can only obey: they cannot govern, and so have no law.

It’s a related sort of misfire, I think, that gives rise to the “would an endless army of ethereal Joes ratify every step of my reasoning, and the reasoning of the ratifiers, and so on?” type of problem I discussed above. That is, one wants every step to conform to some external standard — and the only standards available are built out of armies of ethereal Joes. But those Joes, too, must conform. It’s conformity all the way down — except that for the anti-realist, there’s no bottom.

What’s needed, here, is a type of choice that is creating, rather than trying to conform — and which hence, in a sense, is “infallible.” And here perhaps one thinks, with the realists: surely the types of choices we’re interested in here — choices about which books, feelings, machines, galaxy brains, Gods, to “trust”; which puppies, or nanomachines, to create — are fallible. Or if not, surely they are, in a sense, arbitrary — mere “pickings,” or “plumpings.” If you aren’t trying to conform to some standard, than how can you truly, and non-arbitrarily, choose? I don’t have a worked-out story, here (though I expect that we can at least distinguish such creative choices from e.g. Buridan’s-ass style pickings — for example, they don’t leave you indifferent). But it’s a question that I think subjectivists must face; and which I feel some moderate optimism about answering (though perhaps not in a way that gives realists what they want).

Of course, subjectivists knew, all along, that certain things about themselves were going to end up being treated as effectively infallible, from an evaluative perspective. Whatever goes in Clippy’s utility function slot, for subjectivists, governs what’s valuable relative to Clippy; and it does so, on subjectivism, just in virtue of being there — in virtue of being the stuff that the agent is made out of (this is part of the arbitrariness and contingency that so bothers realists). The problem that the idealizer faces is that actual human agents are not yet fully made: rather, they’re still a tangled mess. But the idealizer’s hope is that they’re sufficiently “on their way to getting made” that we can, effectively, assume they’re already there; the seed has already determined a tree, or a sufficiently similar set of trees; we just haven’t computed the result.

But is that how trees grow? Have you already determined a self? Have you already made what would make you, if all went well? Do you know, already, how to figure out who you are? Perhaps for some the answer is yes, or close enough. Perhaps for all. In that case, you are already trying to do something, already fighting for something — and it is relative to that something that you can fail.

But if the choice has not yet been made, then it is we who will have to make it. If the sea is open, then so too is it ours to sail.

Indeed, even if in some sense, the choice has been made — even if there is already, out there, a privileged idealized version of yourself; even if all of the idealization procedures converge to a single point — the sea, I think, is still open, if you step back and make it so. You can still reject that self, and the authority of the procedure(s) that created it, convergence or no. Here I think of a friend of mine, who expressed some distress at the thought that his idealized self could in principle turn out to be a Voldemort-like character. His distress, to me, seemed to assume that his idealized self was “imposed on him”; that he “had,” as it were, to acknowledge the authority of his Voldemort self’s values. But such a choice is entirely his. He can, if he wishes, reject the Voldemort, and the parts of himself (however strong) that created it; he can forge his own path, towards a new ideal. The fact that he would become a Voldemort, under certain conditions he might’ve thought “ideal,” is ultimately just another fact, to which he himself must choose how to respond.

Perhaps some choices in this vein will be easier, and more continuous/resonant with his counterfactual behavior and his existing decision-making processes; some paths will be harder, and more fragile; some, indeed, are impossible. But these facts are still, I think, just facts; the choice of how to respond to them is open. The point of subjectivism is that the standards (relative to you) used to evaluate your behavior must ultimately be yours; but who you are is not something fixed, to be discovered and acknowledged by investigating what you would do/feel in different scenarios; rather, it is something to be created, and choice is the tool of creation. Your counterfactual self does not bind you.

In a sense, what I’m saying here is that idealizing subjectivism is, and needs to be, less like “realism-lite,” and more like existentialism, than is sometimes acknowledged. If subjectivists wish to forge, from the tangled facts of actual (and hypothetical) selfhood, an ideal, then they will need, I expect, to make many choices that create, rather than conform. And such choices will be required, I expect, not just as a “last step,” once all the “information” is in place, but rather, even in theory, all along the way. Such choice, indeed, is the very substance of the thing.

(To be clear: I don’t feel like I’ve worked this all out. Mostly, I’ve been trying to gesture at, and inhabit, some sort of subjectivist existentialist something, which I currently find more compelling than a more realist-flavored way of trying to be an idealizer. What approach to meta-ethics actually makes most sense overall and in practice is a further question.)

XI. Ghost civilizations

With this reframing in mind, some of the possible circles and indeterminacies discussed above seem to me less worrying — rather, they are just more facts, to be responded to as I choose. Among all the idealized selves (and non-selves), and all combinations, there is no final, infallible evaluative authority — no rescuer, Lord, father; no safety. But there are candidate advisors galore.

Here’s an illustration of what I mean, in the context of an idealization I sometimes think about.

I’ve written, in the past, about a “ghost” version of myself — that is, one that can float free from my body; which travel anywhere in all space and time, with unlimited time, energy, and patience; and which can also make changes to different variables, and play forward/rewind different counterfactual timelines (the ghost’s activity somehow doesn’t have any moral significance).

I sometimes treat such a ghost kind of like an idealized self. It can see much that I cannot. It can see directly what a small part of the world I truly am; what my actions truly mean. The lives of others are real and vivid for it, even when hazy and out of mind for me. I trust such a perspective a lot. If the ghost would say “don’t,” I’d be inclined to listen.

As I usually imagine it, though, the ghost isn’t arbitrarily “ideal.” It hasn’t proved all the theorems, or considered all the arguments. It’s not all that much smarter than me; it can’t comprehend anything that I, with my brain, can’t comprehend. It can’t directly self-modify. And it’s alone. It doesn’t talk with others, or make copies of itself. In a sense, this relative mundanity makes me trust it more. It’s easier to imagine than a galaxy brain. I feel like I “know what I’m dealing with.” It’s more “me.”

We can imagine, though, a version of the thought experiment where we give the ghost more leeway. Let’s let it make copies. Let’s give it a separate realm, beyond the world, where it has access to arbitrary technology. Let’s let it interact with whatever actual and possible humans, past and future, that it wants, at arbitrary depths, and even to bring them into the ghost realm. Let’s let it make new people and creatures from scratch. Let’s let it try out self-modifications, and weird explorations of mind-space — surrounded, let’s hope, by some sort of responsible ghost system for handing explorations, new creatures, and so on (here I imagine a crowd of copy ghosts, supervising/supporting/scrutinizing an explorer trying some sort of process or stimulus that could lead to going off the rails). Let’s let it build, if it wants, a galaxy brain, or a parliament, or a civilization. And let’s ask it, after as much of all this as it wants, to report back about what it values.

If I try to make, of this ghost civilization, some of sort of determinate, privileged ideal, which will define what’s of value, relative to me, I find that I start to run into the problems discussed above. That is, I start wondering about whether the ghost civilization goes somewhere I actually want; how much different versions of it diverge, based on even very similar starting points; how to fix the details in a manner that has any hope of yielding a determinate output, and how arbitrary doing so feels. I wonder whether the ghosts will find suitable methods of cooperating, containing memetic hazards, and so on; whether I would regret defining my values relative to this hazy thought experiment, if I thought about it more; whether I should instead be focusing on a different, even more idealized thought experiment; where the possible idealizing ends.

But if I let go of the thought that there is, or need be, a single “true standard,” here — a standard that is, already, for me, the be-all-end-all of value — then I feel like I can relate to the ghosts differently, and more productively. I can root for them, as they work together to explore the distant reaches of what can be known and thought. I can admire them, where they are noble, cautious, compassionate, and brave; where they build good institutions and procedures; where they cooperate. I can try, myself, to see through their eyes, looking out on the vastness of space, time, and the beings who inhabit it; zooming in, rewinding, examining, trying to understand. In a sense, I can use the image of them to connect with, and strengthen, what I myself value, now (indeed, I think that much actual usage of “ideal advisor” thought experiments, at least in my own life, is of this flavor).

And if I imagine the ghosts becoming more and more distant, alien, and incomprehensible, I can feel my confidence in their values begin to fray. Early on, I’m strongly inclined to defer to them. Later, I am still rooting for them; but I start to see them as increasingly at the edges of things, stepping forward into the mist; they’re weaving on a tapestry that I can’t see, now; they’re sailing, too, on the open sea, further than I can ever go. Are they still good, relative to me? Have they gone “off the rails”? The question itself starts to fade, too, and with it the rails, the possibility of mistake. Perhaps, if necessary, I could answer it; I could decide whether to privilege the values of some particular ghost civilization, however unrecognizable, over my own current feelings and understanding; but answering is increasingly an act of creation, rather than an attempt at discovery.

Certainly, I want to know where the ghost civilization goes. Indeed, I want to know where all the non-Joe civilizations, ghostly or not, go too. I want to know where all of it leads. And I can choose to defer to any of these paths, Joe or non-Joe, to different degrees. I’m surrounded, if I wish to call on them, by innumerable candidate advisors, familiar and alien. But the choice of who, if any of them, to listen to, is mine. Perhaps I would choose, or not, to defer, given various conditions. Perhaps I would regret, or not; would kick myself, or not; would rejoice, or not. I’m interested to know that, too. But these “woulds” are just more candidate advisors. It’s still on me, now, in my actual condition, to choose.

(Thanks to Katja Grace, Ketan Ramakrishnan, Nick Beckstead, Carl Shulman, and Paul Christiano for discussion.)

Lukas_GloorJun 22 202111

I think this post is brilliant!

I plan to link to it heavily in an upcoming piece for my moral anti-realism sequence.

On X., Passive and active ethics:

Rather, what I’m trying to point at is a way that importing and taking for granted a certain kind of realist-flavored ethical psychology can result in an instructive sort of misfire. Something is missing, in these cases, that I expect the idealizing subjectivist needs. In particular: these agents, to the end, lack an affordance for a certain kind of direct, active agency — a certain kind of responsibility, and self-creation. They don’t know how to choose, fully, for themselves.

Yeah, I think there's a danger for people who expect that "having more information," or other features of some idealized reflection procedure, would change the phenomenology of moral reasoning, such that once they're in the reflection procedure, certain answers will stick out to them. But, as you say, this point may never come! So instead, it could continue to feel like one has to make difficult judgment calls left and right, with no guarantee that one is doing moral reasoning "the right way."

(In fact, I'm convinced such a phase change won't come. I have a draft on this.)

In a sense, what I’m saying here is that idealizing subjectivism is, and needs to be, less like “realism-lite,” and more like existentialism, than is sometimes acknowledged.

I've also used the phrase "more like existentialism" in this context. :)

On IX., Hoping for convergence, tolerating indeterminacy:

This is an excellent strategy for people who find themselves without strong object-level intuitions about their goals/values. (Or people who only have strong object-level intuitions about some aspects of their goals/values, but not the details. E.g., being confident that one would want to be altruistic, but unsure about population ethics or different theories of well-being. [In these cases, perhaps with a guarantee for the reflection procedure to not to change the overarching objective – being altruistic, or finding a suitable theory of well-being, etc.])

Some people would probably argue that "Hoping for convergence, tolerating indeterminacy" is the rational strategy in the light of our metaethical uncertainty. (I know you're not necessarily saying this in your post.) For example, they might argue as follows:

"If there's convergence among reflection procedures, I miss out if I place too much faith in my object-level intuitions and already formed moral convictions. By contrast, if there's no convergence, then it doesn't matter – all outcomes would be on the same footing."

I want to push back against this stance, "rationally mandated wagering on convergence." I think it only makes sense for people whose object-level values are still under-defined. By contrast, if you find yourself with solid object-level convictions about your values, then you not only stand something to gain from wagering on convergence. You also stand things to lose. You might be giving up something you feel is worth fighting for to follow the kind-of-arbitrary outcome of some reflection procedure.

My point is, the currencies are commensurable: What's attractive about the possibility of many reflection procedures converging is the same thing that's attractive to people who already have solid object-level convictions about their values (assuming they're not making one of the easily identifiable mistakes, i.e., assuming that, for them, there'd be no convergence among reflection procedures that are open-ended enough to get them to adopt different values). Namely, when they reflect to the best of their abilities, they feel drawn to certain moral principles or goals or specific ways of living their lives.

In other words, the importance of moral reflection for someone is exactly proportional to their credence in it changing their thinking. If someone feels highly uncertain, they almost exclusively have things to gain. By contrast, the more certain you already are in your object-level convictions, the larger the risk that deferring to some poorly understood reflection procedure would lead you to an outcome that constitutes a loss, in a sense relevant to your current self. Of course, one can always defer to conservative reflection procedures, i.e., procedures where one is fairly confident that they won't lead to drastic changes in one's thinking. Those could be used to flesh out one's thinking in places where it's still uncertain (and therefore, possibly, under-defined), while protecting convictions that one would rather not put at risk.

Joe_CarlsmithJun 24 20215

I'm glad you liked it, Lukas. It does seem like an interesting question how your current confidence in your own values relates to your interest in further "idealization," of what kind, and how much convergence makes a difference. Prima facie, it does seems plausible that greater confidence speaks in favor"conservatism" about what sorts of idealization you go in for, though I can imagine very uncertain-about-their-values people opting for conservatism, too. Indeed, it seems possible that conservatism is just generally pretty reasonable, here.

richard_ngoJun 22 20218

Fantastic post. A few scattered thoughts inspired by it:

If you aren’t trying to conform to some standard, than how can you truly, and non-arbitrarily, choose?

Why does our choice need to be non-arbitrary? If we take certain intuitions/desires/instincts as primitives, they may be fundamentally arbitrary, but that's because we are unavoidably arbitrary. Yet this arbitrary initial state is all we have to work from.

What’s needed, here, is a type of choice that is creating, rather than trying to conform — and which hence, in a sense, is “infallible.”

It feels like infallible is the wrong type of description here, for the same reason that it would be odd to say that my taste in food is infallible. At a certain level the predicate "correct" will stop making sense. (Maybe that level isn't the level of choices, though; maybe it's instincts, or desires, or intuitions, or tastes - things that we don't see ourselves as having control over.)

Joe_CarlsmithJun 24 20212

Thanks, Richard :). Re: arbitrariness, in a sense the relevant choices might well end up arbitrary (and as you say, subjectivists need to get used to some level of unavoidable arbitrariness), but I do think that it at least seems worth trying to capture/understand some sort of felt difference between e.g. picking between Buridan's bales of hay, and choosing e.g. what career to pursue, even if you don't think there's a "right answer" in either case.

I agree that "infallible" maybe has the wrong implications, here, though I do think that part of the puzzle is the sense in which these choices feel like candidates for mistake or success; e.g., if I choose the puppies, or the crazy galaxy Joe world, I have some feeling like "man, I hope this isn't a giant mistake." That said, things we don't have control over, like desires, do feel like they have less of this flavor.

Lukas_GloorJul 7 20212

Here's another issue:

Lack of morally urgent causes

In the blogpost On Caring, Nate Soares writes: “It's not enough to think you should change the world — you also need the sort of desperation that comes from realizing that you would dedicate your entire life to solving the world's 100th biggest problem if you could, but you can't, because there are 99 bigger problems you have to address first.” In the moral-reflection environment, the world is on pause. If you’ve suffered from poverty, illnesses or abuse in your life, these things are no longer an issue. Also, there are no people to lift out of poverty and no factory farms to shut down. You’re no longer in a race against time to prevent bad things from happening, seeking friends and allies while trying to defend your cause against corrosion from influence-seeking people.

Without morally urgent causes, it’s harder to form a strong identity around wanting to do good. It’s still morally important what you decide – after all, your deliberations in the reflection procedure determine how to allocate your caring capacity. Still, you’re deliberating about how to do that from a perspective where everything is well. For better or worse, this perspective could change the nature of moral reflection as compared to how people adopt moral convictions in real-life conditions.

Lukas_GloorDec 24 20213

The above comment is probably hard to understand. Here's a better explanation of what I wanted to say:

Lack of morally urgent causes:
In the blogpost On Caring, Nate Soares writes: “It's not enough to think you should change the world — you also need the sort of desperation that comes from realizing that you would dedicate your entire life to solving the world's 100th biggest problem if you could, but you can't, because there are 99 bigger problems you have to address first.”

In that passage, Soares points out that desperation can be a strong motivating factor to why some people develop an identity around effective altruism. Interestingly enough, in some moral reflection procedures , the outside world is on pause. Reflection procedures are thinking-and-acting sequences we'd undergo if we had ample time and resources (no opportunity costs from urgent moral issues!).

When you're in the reflection procedure, there’s no reason to experience the phenomenology of “desperation” that Soares describes. If you’ve suffered from poverty, illnesses or abuse, these hardships are no longer an issue. Also, there are no people to lift out of poverty and no factory farms to shut down. You’re no longer in a race against time to prevent bad things from happening, seeking friends and allies while trying to defend your cause against corrosion from influence seekers. Without morally urgent causes, it’s less motivating to go all-out by adopting an identity around some viscerally motivating, morality-inspired life goal.

Instead, reflection inside the reflection procedure may feel more like writing that novel you’ve always wanted to write – it has less the feel of a “mission,” and more the feel of “doing justice to your long-term dream.”

Note that whether this is a good or bad thing is an open question. It still remains morally important what you decide in the reflection procedure. The stakes remain high because your deliberations determine how to allocate your caring capacity. Still, you’re deliberating from a perspective where everything is well, so what’s missing is moral urgency. Unless you take counteracting measures, I could imagine that you’re more likely to form an identity as “someone who prevents plans for future utopia from going poorly” than “someone who addresses ongoing/immediately foreseeable risks or injustices.”

For better or worse, the state of the world in the reflection procedure could change the nature of your moral reflection (as compared to how people adopt strong moral convictions in our more familiar circumstances).

VhanonJun 25 20211

I feel you've been discussing how confusing the consequences of the definition above are. Then, why don't you just drop the definition and revise it?

I would propose: X is intrinsically valuable, relative to an agent A belonging to a close-influence set of agents S, if and only if, and because, A and all the agents in S would have some set of evaluative attitudes toward X, if A and all agents in S had undergone some sort of idealization procedure.

And by close-influence set, I mean a set of agent that cannot be influenced by anything else outside the set.

I think that most of the concerns you are describing come from assuming that the idealisation process is personal and that there are multiple idealised evaluative attitudes toward something.

When you assume one unique evaluation, you can view that the agents are all trying to discover it. In the end of the process, there are no further questions, everybody agrees, and subjective is the same as objective. During the process, you have differences, personal evaluation, change of hearts, and all the chaos you describe.

Resources and time are probably limited to carry out the idealised process on every possible object X, but hopefully as a human race we can discover unanimous agreement on one or two big important questions within the next few millions of years.

Let me build a story-case for unanimous convergence.

Imagine you are troglodyte, and you are trying to assess how far the hunting ground is. You need to estimate where you are, and what time of the day is, because being at the wrong place at the wrong time means either meeting a stronger predator or missing your target. Now, how do you evaluate the way you measure time? Do you have a preference for looking at the sun, perhaps it is a cloudy day, is it a good idea? Do you prefer to listen to your human body rhythm (when you are hungry)? Do you follow somebody example? Do you look at the rain? Do you watch the behaviour of the animals around you? Do you draw symbol on the ground to recall your way? Do you break tree branches? Do you leave a trail of stones? Do you dig a road?

The troglodyte is probably going to be in dilemma and debate the issue strongly with his clan-mates (like it happens to the agents in many of the scenarios you discuss about). Nowadays how we measure time and map location in our everyday life is something we mostly all agree.

Effective Altruism Forum
EA Forum