Hide table of contents

(This is a repost of a short-form, which I realized might be worth making into its own post. It’s partly inspired by Greg Lewis’s recent post “Rational Predictions Often Update Predictably.”)[1]

The existential risk community’s level of concern about different possible risks is correlated with how hard-to-analyse these risks are. For example, here is The Precipice’s ranking of the top five most concerning existential risks:

  1. Unaligned artificial intelligence[2]
  2. Unforeseen anthropogenic risks (tied)
  3. Engineered pandemics (tied)
  4. Other anthropogenic risks
  5. Nuclear war (tied)
  6. Climate change (tied)

This isn’t surprising.

For a number of risks, when you first hear and think a bit about them, it’s reasonable to have the reaction “Oh, hm, maybe that could be a huge threat to human survival” and initially assign something on the order of a 10% credence to the hypothesis that it will by default lead to existentially bad outcomes. In each case, if we can gain much greater clarity about the risk, then we should think there’s about a 90% chance this clarity will make us less worried about it. We’re likely to remain decently worried about hard-to-analyze risks (because we can’t get greater clarity about them) while becoming less worried about easy-to-analyze risks.

In particular, our level of worry about different plausible existential risks is likely to roughly track our ability to analyze them (e.g. through empirical evidence, predictively accurate formal models, and clearcut arguments).

Some plausible existential risks also are far easier to analyze than others. If you compare 80,000 Hours’ articles on climate change and artificial intelligence, for example, then I think it is pretty clear that people analyzing existential risks from climate change simply have a lot more to go on. When we study climate change, we can rely on climate models that we have reason to believe have a decent amount of validity. We can also draw on empirical evidence about the historical effects of previous large changes in global temperature and about the ability of humans and other species to survive under different local climate conditions. As a conceptual foundation, we are also lucky to have a set of precise and scientifically validated concepts (e.g. “temperature” and "sea-level") that we can use to avoid ambiguity in our analysis. And so on.

We’re in a much worse epistemic position when it comes to analyzing the existential risk posed by misaligned AI: we’re reliant abstract arguments that use ambiguous concepts (e.g. “objectives” and “intelligence”), rough analogies, observations of the behaviour of present-day AI systems (e.g. reinforcement learners that play videogames) that will probably be very different than future AI systems, a single datapoint (the evolution of human intelligence and values) that has a lot of important differences with the case we’re considering, and attempts to predict the incentives and beliefs of future actors in development scenarios that are still very opaque to us. Even if misaligned AI actually poses very little risk to continued human survival, then it’s hard to see how we could become really confident of that.

Some upshots:

  1. The fact that the existential risk community is particularly worried about misaligned AI might largely reflect the fact that it’s hard to analyze risks from misaligned AI.

  2. Nonetheless, even if the above possibility is true, it doesn't at all follow that the community is irrational to worry more about misaligned AI than other potential risks. It’s completely coherent to have something like this attitude: “If I could think more clearly about the risk from misaligned AI, then I would probably come to realize it’s not a far bigger deal than other risks. But, in practice, I can’t yet think very clearly about it. That means that, unlike in the case of climate change, I also can’t rule out the small possibility that clarity would make me much more worried about it than I currently am. So, on balance, I should feel more worried about misaligned AI than I do about other risks. I should focus my efforts on it, even if — to better-informed future observers — I'll probably look over-worried after the fact.”

  3. For hard-to-analyze risks, it matters a lot what your “prior” in the risks is (since detailed evidence, models, and arguments can only really move you so far from your baseline impression). I sometimes get the sense that some people are starting from a prior that’s not far from 50%: For example, people who are very worried about misaligned AI sometimes use the rhetorical move “How would the world look different if AI wasn’t going to kill everyone?”, and this move seems to assume that empirical evidence is needed to shift us down from a high credence. I think that other people (including myself) are often implicitly starting from a low prior and feel the need to be argued up. Insofar as it’s very unclear how we should determine our priors, and it's even a bit unclear what exactly a "prior" means in this case, it’s also unsurprising that there’s a particularly huge range of variation in estimates of the risk from misaligned AI.[3]


  1. Clarification: The title of this post is using the word "expect" in the everyday sense of the world, rather than the formal probability theory sense of the word. A less ambiguous title might have been "We will predictably worry more about speculative risks." ↩︎

  2. Toby Ord actually notes, in the section of The Precipice that gives risk estimates: "The case for existential risk from AI is clearly speculative. Indeed, it is the most speculative case for a major risk in this book." ↩︎

  3. Of course, not everyone agrees with that it’s so difficult to assess the risk from misaligned AI. Some people believe that the available arguments, evidence from evolution, and so on actually do count very strongly — or, even, nearly decisively — toward AI progress leading to human extinction by default. The argument I’ve made in this post doesn’t apply very well to this group. Rather, the argument applies to people who think of existing analysis of AI risk as suggestive, perhaps strongly suggestive, but still far from clearcut. ↩︎

120

0
0

Reactions

0
0

More posts like this

Comments14
Sorted by Click to highlight new comments since:

I agree with the general point that because you predictably expect to update downwards with more information, the risks with the least information will tend to have larger estimates. But:

For a number of risks, when you first hear and think a bit about them, it’s reasonable to have the reaction “Oh, hm, maybe that could be a huge threat to human survival” and initially assign something on the order of a 10% credence to the hypothesis that it will by default lead to existentially bad outcomes.

Really? I feel like there are so many things that could provoke that reaction from me, and I'd expect that for ~99% of them I'd later update to "no, it doesn't really seem plausible that's a huge threat to human survival". If we conservatively say that I'd update to 1% on those risks, and for the other ~1% where I updated upwards I'd update all the way to "definitely going to kill humanity", then my current probability should be upper bounded by  or roughly 2%.

It does feel like you need quite a bit more than "hmm, maybe" to get to 10%. (Though note "a lot of people who have thought about it a bunch are still worried" could easily get you there.)

This seems a little ungenerous to the OP.

Minor and plausible parameter changes here gets us back to their beliefs.

  • maybe we can accept that they didn’t encounter or have a different bar for x risks. So maybe it’s 10 risks they consider and 1 of those is AI risk at 100%.
  • maybe for many of the other risks their valuation is 5-25% (because they have a different value system for what’s bad or leads to lock in).
[This comment is no longer endorsed by its author]Reply

I’m a bit confused by this post. I’m going to summarize the main idea back, and I would appreciate it if you could correct me where I’m misinterpreting.

Human psychology is flawed in such a way that we consistently estimate the probability of existential risk from each cause to be ~10% by default. In reality, the probability of existential risk from particular causes is generally less than 10% [this feels like an implicit assumption], so finding more information about the risks causes us to decrease our worry about those risks. We can get more information about easier-to-analyze risks, so we update our probabilities downward after getting this correcting information, but for hard-to-analyze risk we do not get such correcting information so we remain quite worried. AI risk is currently hard-to-analyze, so we remain in this state of prior belief (although the 10% part varies by individual, could be 50% or 2%).

I’m also confused about this part specifically: 

initially assign something on the order of a 10% credence to the hypothesis that it will by default lead to existentially bad outcomes. In each case, if we can gain much greater clarity about the risk, then we should think there’s about a 90% chance this clarity will make us less worried about it

 – why is there a 90% chance that more information leads to less worry? Is this assuming that for 90% of risks, they have P(Doom) < 10%, and for the other 10% of risks P(Doom) ≥ 10%?

This is a helpful comment - I'll see if I can reframe some points to make them clearer.

Human psychology is flawed in such a way that we consistently estimate the probability of existential risk from each cause to be ~10% by default.

I'm actually not assuming human psychology is flawed. The post is meant to be talking about how a rational person (or, at least, a boundedly rational person) should update their views.

On the probabilities: I suppose I'm implicitly evoking both a subjective notion of probability ("What's a reasonable credence to assign to X happening?" or "If you were betting on X, what betting odds should you be willing to accept?") and a more objective notion ("How strong is the propensity for X to happen?" or "How likely is X actually?" or "If you replayed the tape a billion times, with slight tweaks to the initial conditions, how often would X happen?").[1] What it means for something to pose a "major risk," in the language I'm using, is for the objective probability of doom to be high.

For example, let's take existential risks from overpopulation. In the 60s and 70s, a lot of serious people were worried about near-term existential risks from overpopulation and environmental depletion. In hindsight, we can see that overpopulation actually wasn't a major risk. However, this wouldn't have been clear to someone first encountering the idea and noticing how many experts took it seriously. I think it might have been reasonable for someone first hearing about The Population Bomb to assign something on the order of a 10% credence to overpopulation being a major risk.

I think, for a small number of other proposed existential risks, we're in a similar epistemic position. We don't yet know enough to say whether it's actually a major risk, but we've heard enough to justify a significant credence in the hypothesis that it is one.[2]

why is there a 90% chance that more information leads to less worry? Is this assuming that for 90% of risks, they have P(Doom) < 10%, and for the other 10% of risks P(Doom) ≥ 10%?

If you assign a 10% credence to something not being a major risk, then you should assign a roughly 90% credence to further evidence/arguments helping you see that it's not a major risk. If you become increasingly confident that it's not a major risk, then your credence in doom should go down.


  1. You can also think of the objective probability as, basically, what your subjective should become if you gained access to dramatically more complete evidence and arguments. ↩︎

  2. The ~10% number is a bit arbitrary. I think it'd almost always be unreasonable to be close to 100% confident that something is a major existential risk, after hearing just initial rough arguments and evidence for it. In most cases - like when hearing about possible existential risks from honeybee collapse - it's in fact reasonable to start out with a credence below 1%. So, when I'm talking about risks that we should assign "something on the order of a 10% credence to," I'm talking about the absolute most plausible category of risks. ↩︎

One way to think about this phenomenon of reversion to the mean is in the bandit problem setting, where you are choosing between a bunch of levers, to ascertain their payouts. This is not my area, so take this with a grain of salt, but here is my understanding. There are a bunch of levers. You can think that each lever gives out payoffs with a normal distribution. The mean and variance of each lever is itself randomly initialised with some known distribution. There are a many ways to choose a lever, that have pretty different behaviour. You could (1) be a myopic bayesian, and choose the lever with highest expected immediate payoff, taking into account that levers with high variance probably aren't as good as they appear, (2) take a simplistic historical approach, and choose the lever with the highest payoff when you pulled it in the past, or (3) use the "upper confidence bound" algorithm, which chooses the lever for which the upper bound of your confidence interval of the payoff is the highest. It turns out that option (3), which is pretty over-optimistic about, and not very cautious about your impact - converges to optimality, and does so more quickly than (noisy) variations of (1) and (2). If you're feollowing a strategy like (3), then you'll switch a lot, and the levers that you pull will often not appear optimal in hindsight, but that's just the consequence of proper exploration.

NB. If any experts want to clarify/fix anything I've said then please do.

An idealised bayesian would know that these "high upper confidence bound" levers probably pay off less than they appear. So although we spend more time thinking about or focusing on higher-variance risks, we should not fear  or worry about them any extra.

The bandit problem is definitely related, although I'm not sure it's the best way to formulate the situation here. The main issue is that the bandit formulation, here, treats learning about the magnitude of a risk and working to address the risk as the same action - when, in practice, they often come apart.

Here's a toy model/analogy that feels a bit more like it fits the case, in my mind.

Let's say there are two types of slot machines: one that has a 0% chance of paying and one that has a 100% chance of paying. Your prior gives you a 90% credence that each machine is non-paying.[1]

Unfortunately: When you pull the lever on either machine, you don't actually get to see what the payout is. However, there's some research you can do to try to get a clearer sense of what each machine's "type" is.

And this research is more tractable in the case of the first machine. For example: Maybe the first machine has identifying information on it, like a model number, which might allow you to (e.g.) call up the manufacturer and ask them. The second machine is just totally nondescript.

The most likely outcome, then, is that you quickly find out that the first slot machine is almost certainly non-paying -- but continue to have around a 10% credence that the second machine pays.

In this scenario, you should keep pulling the lever on the second machine. You should also, even as a rational Bayesian, actually be more optimistic about the second machine.

(By analogy, I think we actually should tend to fear speculative existential risks more.]


  1. A more sophisticated version of this scenario would have a continuum of slot machine types and a skewed prior over the likelihood of different types arising. ↩︎

Interesting, that makes perfect sense. However, if there's no correlation between the payoff of an arm and our ability to know it, then we should eventually find an arm that pays off 100% of the time with high probability, pull that arm, and stop worrying about the unknowable one. So I'm not sure your story explains why we end up fixating on the uncertain interventions (AIS research). 

Another way to explain why the uncertain risks look big would be that we are unable to stop society pulling the AI progress lever until we have proven it to be dangerous. Definitely risky activities just get stopped! Maybe that's implicitly how your model gets the desired result.

However, if there's no correlation between the payoff of an arm and our ability to know it, then we should eventually find an arm that pays off 100% of the time with high probability, pull that arm, and stop worrying about the unknowable one. So I'm not sure your story explains why we end up fixating on the uncertain interventions (AIS research).

The story does require there to be only a very limited number of arms that we initially think have a non-negligible chance of paying. If there are unlimited arms, then one of them should be both paying and easily identifiable.

So the story (in the case of existential risks) is that there are only a very small number of risks that, on the basis of limited argument/evidence, initially seem like they might lead to extinction or irrecoverable collapse by default. Maybe this set looks like: nuclear war, misaligned AI, pandemics, nanotechnology, climate change, overpopulation / resource depletion.

If we're only talking about a very limited set, like this, then it's not too surprising that we'd end up most worried about an ambiguous risk.

Interesting, that makes perfect sense. However, if there's no correlation between the payoff of an arm and our ability to know it, then we should eventually find an arm that pays off 100% of the time with high probability, pull that arm, and forget about the unknowable one. So I'm not sure your story explains why we end up fixating on the uncertain interventions (AIS research). It seems you need an additional element where society is unable to stop itself pulling the AI progress lever...

Do you have a sense how this argument relates to Amanda Askell's argument for the importance of value of information?

I think we could probably invest a lot more time and resources in interventions that are plausibly good, in order to get more evidence about them. We should probably do more research, although I realise this point is somewhat self-serving. For larger donors, this probably means diversifying their giving more if the value of information diminishes steeply enough, which I think might be the case.

Psychologically, I think we should be a bit more resilient to failure and change. When people consider the idea that they might be giving to cause areas that could turnout to be completely fruitless, I think they find it psychologically difficult. In some ways, just thinking "Look, I'm just exploring this to get the information about how good it its, and if it's bad, I'll just change. Or, if it doesn't do as well as I thought, I'll just change." can be quite comforting if you worry about these things.

The extreme view that you could have is "We should just start investing time and money in interventions with high expected value, but little or no evidential support." A more modest proposal, that I tentatively endorse, is "We should probably start explicitly including the value of information, and assessment of causes and interventions, rather than treating it as an afterthought to concrete value." In my experience, information value can swamp concrete value; and if that is the case, it really shouldn't be an afterthought. Instead it should be one of the primary drivers of values, not an afterthought in your calculation summary.

Amanda is talking about the philosophical principle, whereas I'm talking about the algorithm that roughly satisfies it. The principle is that a non-myopic Bayesian will take into account not just the immediate payoff, but also the information value of an action. The algorithm - upper confidence bound - efficiently approximates this behaviour. The fact that UCB is optimistic (about its impact) suggests that we might want to behave similarly, in order capture the information value. ("Information value of an action" and "exploration value" are synonymous here.)

I agree with the general thrust of the post, but when analyzing technological risks I think one can get substantial evidence by just considering the projected "power level" of the technology, while you focus on evidence that this power level will lead to extinction. I agree the latter is much hard to get evidence about but I think the former is sufficient to be very worrisome without much evidence on the latter.

Specifically, re: AI you write:

we’re reliant abstract arguments that use ambiguous concepts (e.g. “objectives” and “intelligence”), rough analogies, observations of the behaviour of present-day AI systems (e.g. reinforcement learners that play videogames) that will probably be very different than future AI systems, a single datapoint (the evolution of human intelligence and values) that has a lot of important differences with the case we’re considering, and attempts to predict the incentives and beliefs of future actors in development scenarios that are still very opaque to us.

I roughly agree with all of this, but by itself the argument that we will within the next century plausibly create AI systems that are more powerful than humans (e.g. Ajeya's timelines report) seems like enough to get the risk pretty high. I'm not sure what our prior should be on existential risk conditioned on a technology this powerful being developed, but honestly starting from 50% might not be unreasonable.

Similar points were made previously e.g. by Richard Ngo with the "second species" argument, or by Joe Carlsmith in his report on x-risk from power-seeking AI: "Creating agents who are far more intelligent than us is playing with fire."

Curated and popular this week
Relevant opportunities