Hide table of contents

These are some selection effects impacting what ideas people tend to get exposed to and what they'll end up believing, in ways that make the overall epistemics worse. These have mostly occured to me about AI discourse (alignment research, governance, etc), mostly on LessWrong. (They might not be exclusive to discourse on AI risk.)

(EDIT: I've reordered the sections in this post so that less people get stuck on what was the first section and so they a better chance of reading the other two sections.)

Outside-view is overrated

In AI discourse, outside-view (basing one's opinion on other people's and on (things that seem like) precedents), as opposed to inside-view (having an actual gears-level understanding of how things work), is being quite overrated for a variety of reasons.

  • There's the issue of outside-view double-counting, as in this comic I drew. When building an outside-view, people don't particularly check whether 10 people say the same thing because they came up with independently, or because 9 of them heard it from the 1 person who came up with it, and they themselves mostly stuck to outside-view.
  • I suspect that outside-view is being over-valued because it feels safe — if you just follow what you believe to be consensus and/or an authority, then it can feel less like it's "your fault" if you're wrong. You can't really just rely on someone else's opinion on something, because they might be wrong, and to know if they're wrong you need an inside-view yourself. And there's a fundamental sense in which developing your own inside-view of AI risk is contributing to the research, whereas just reusing what exists is neutral, and {reusing what exists + amplifying it based on what has status or memetic virulence} is doing damage to the epistemic commons, due to things like outside-view double-counting.
  • There's occasionally a tendency to try to adopt the positions that are held by authority figures/organizations in order to appeal to them, to get resources/status, and/or generally to fit in. (Similarly, be wary of the opposite as well — having a wacky opinion in order to get quirkyness/interestingness points.)
  • "Precedents"-based ideas are pretty limited — there isn't much that looks similar to {us building things that are smarter than us and as-flexible-as-software}. The comparison with {humans as mesa-optimizers relative to evolution} has been taken way outside of its epistemic range.

Arguments about P(doom) are filtered for nonhazardousness

Some of the best arguments for high P(doom) / short timelines that someone could make would look like this:

It's not that hard to build an AI that kills everyone: you just need to solve [some problems] and combine the solutions. Considering how easy it is compared to what you thought, you should increase your P(doom) / shorten your timelines.

But obviously, if people had arguments of this shape, they wouldn't mention them, because they make it easier for someone to build an AI that kills everyone. This is great! Carefulness about exfohazards is better than the alternative here.

But people who strongly rely on outside-view for their P(doom) / timelines should be aware that their arguments are being filtered for nonhazardousness. Note that this plausibly applies to other topics than P(doom) / timelines.

Note that beyond not-being-mentioned, such arguments are also anthropically filtered against: in worlds where such arguments have been out there for longer, we died a lot quicker, so we're not there to observe those arguments having been made.

Confusion about the problem often leads to useless research

People enter AI risk discourse with various confusions, such as:

  • What are human values?
  • Aligned to whom?
  • What does it mean for something to be an optimizer?
  • Okay, unaligned ASI would kill everyone, but how?
  • What about multipolar scenarios?
  • What counts as AGI, and when do we achieve that?

Those questions about the problem do not particularly need fancy research to be resolved; they're either already solved or there's a good reason why thinking about them is not useful to the solution. For these examples:

These answers (or reasons-why-answering-is-not-useful) usually make sense if you're familiar with rationality and alignment, but some people are still missing a lot of the basics of rationality and alignment, and by repeatedly voicing these confusions they cause people to think that those confusions are relevant and should be researched, causing lots of wasted time.

It should also be noted that some things are correct to be confused about. If you're researching a correlation or concept-generalization which doesn't actually exist in the territory, you're bound to get pretty confused! If you notice you're confused, ask yourself whether the question is even coherent/true, and ask yourself whether figuring it out helps save the world.

4

2
3

Reactions

2
3

More posts like this

Comments3
Sorted by Click to highlight new comments since:

What are human values?

We don't need to figure out this problem, we can just implement CEV without ever having a good model of what "human values" are.

Aligned to whom?

The vast majority of the utility you have to gain is from {getting a utopia rather than everyone-dying-forever}, rather than {making sure you get the right utopia}.

What does it mean for something to be an optimizer?

Expected utility maximization seems to fully cover this. More general models aren't particularly useful to saving the world.

For what it's worth, I have significant disagreements with basically all of your short replies to these basic questions, and I've been heavily engaged in AI alignment discussions for several years. So, I strongly disagree with your claim that these questions are "either already solved or there's a good reason why thinking about them is not useful to the solution", at least in the way you seem to think they have been solved.

I feel like they're at least solved-enough that they're not particularly what should be getting focused on. I predict that in worlds where we survive, spending time on those question doesn't end up having cashed out to much value.

Executive summary: The post discusses three selection effects biasing AI risk discourse: overvaluing outside views, filtering arguments for safety, and pursuing useless research based on confusion.

Key points:

  1. Overreliance on outside views like consensus opinions double counts evidence and feels safer than developing independent expertise.
  2. Strong arguments for high extinction risk often look unsafe to share, so discourse misses hazardous insights.
  3. Confusions about core issues lead researchers down useless paths instead of focusing on decisive factors.
  4. Checking whether a question is coherent or helps save the world can avoid wasted effort.
  5. Tabooing terms like AGI may help avoid distraction on irrelevant definitional debates.
  6. Recognizing these selection effects can improve individual and collective epistemics.

 

 

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

Curated and popular this week
Relevant opportunities