It is common to argue for the importance of x-risk reduction by emphasizing the immense value that may exist over the course of the future, if the future comes. Famously, for instance, Nick Bostrom calculates that, even under relatively sober estimates of the number and size of future human generations, "the expected value of reducing existential risk by a mere one millionth of one percentage point is at least a hundred times the value of a million human lives".
[Note: People sometimes use the term "x-risk" to refer to slightly different things. Here, I use the term to refer to an event that would bring the value of future human civilization to roughly zero—an extinction event, a war in which we bomb ourselves back to the Stone Age and get stuck there forever, or something along those lines.]
Among those who take such arguments for x-risk reduction seriously, there seem to be two counterarguments commonly (e.g. here) raised in response. First, the future may contain more pain than pleasure. If we think that this is likely, then, at least from the utilitarian perspective, x-risk reduction stops looking so great. Second, we may have opportunities to improve the trajectory of the future, such as by improving the quality of global institutions or by speeding economic growth, and such efforts may have even higher expected value than (immediate) x-risk reduction. "Mundane" institution-building efforts may also have the benefit of reducing future catastrophic risks, should they arise.
It seems to me that there is another important consideration which complicates the case for x-risk reduction efforts, which people currently neglect. The consideration is that, even if we think the value of the future is positive and large, the value of the future conditional on the fact that we marginally averted a given x-risk may not be. And in any event, these values are bound to depend on the x-risk in question.
For example: There are things we currently do not know about human psychology, some of which bear on how inclined we are toward peace and cooperation. Perhaps Steven Pinker is right, and violence will continue its steady decline, until one evening sees the world's last bar fight and humanity is at peace forever after. Or perhaps he's wrong—perhaps a certain measure of impulsiveness and anger will always remain, however favorable the environment, and these impulses are bound to crop up periodically in fights and mass tortures and world wars. In the extreme case, if we think that the expected value of the future (if it comes) is large and positive under the former hypothesis but large and negative under the latter, then the possibility that human rage may end the world is a source of consolation, not worry. It means that the existential risk posed by world war is serving as a sort of "fuse", turning off the lights rather than letting the family burn.
As an application: if we think the peaceful-psychology hypothesis is more likely than the violent-psychology hypothesis, we might think that the future has high expected value. We might thus consider it important to avert extinction events like asteroid impacts, which would knock out worlds "on average". But we might oppose efforts like the Nuclear Threat Initiative, which disproportionately save violent-psychology worlds. Or we might think that the sign of the value of the future is positive in either scenario, but judge that one x-risk is worth devoting more effort to than another, all else equal.
Once we start thinking along these lines, we open various cans of worms. If our x-risk reduction effort starts far "upstream", e.g. with an effort to make people more cooperative and peace-loving in general, to what extent should we take the success of the intermediate steps (which must succeed for the x-risk reduction effort to succeed) as evidence that the saved world would go on to a great future? Should we incorporate the fact of our own choice to pursue x-risk reduction itself into our estimate of the expected value of the future, as recommended by evidential decision theory, or should we exclude it, as recommended by causal? How should we generate all these conditional expected values, anyway?
Some of these questions may be worth the time to answer carefully, and some may not. My goal here is just to raise the broad conditional-value consideration which, though obvious once stated, so far seems to have received too little attention. (For reference: on discussing this consideration with Will MacAskill and Toby Ord, both said that they had not thought of it, and thought that it was a good point.) In short, "The utilitarian imperative 'Maximize expected aggregate utility!'" might not really, as Bostrom (2002) puts it, "be simplified to the maxim 'Minimize existential risk'". It may not enough to do our best to save the world. We may also have to consider which world gets saved.
I'd guess that we don't have to think much about which world we're saving. My reasoning would be that the expected value of the world in the long-run is mostly predictable from macrostrategic -- even philosophical -- considerations. e.g. agents more often seek things that make them happier. The overall level of preference fulfilment that is physically possible might be very large. There's not much reason to think that pain is easier to create than pleasure (see [1] for an exploration of the question), and we'd expect the sum of recreation and positive incentivization to be more than the amount of disincentivization (e.g. retribution or torture).
I think a proper analysis of this would vindicate the existential risk view as a simplification of maximize utility (modulo problems with infinite ethics (!)). But I agree that all of this needs to be argued for.
1. http://reflectivedisequilibrium.blogspot.com/2012/03/are-pain-and-pleasure-equally-energy.html
I agree that it’s totally plausible that, once all the considerations are properly analyzed, we’ll wind up vindicating the existential risk view as a simplification of “maximize utility”. But in the meantime, unless one is very confident or thinks doom is very near, “properly analyze the considerations” strikes me as a better simplification of “maximize utility”.
Even if you do think possible doom is near, you might want an intermediate simplification like "some people think about consequentialist philosophy while most mitigate catastrophes that would put this thinking process at risk".
Agreed.
Very interesting and well written, thank you for this important idea!
Playing around with toy examples, I noticed another version of this consideration. The success of some actions might imply that there is a higher likelihood of a dangerous world (for which there might be better alternative actions).
For example, consider allocating resources for handling bio-terrorism. One important question is how likely is it that there will be individuals that want to cause massive harm and are able to. If this resource allocation does indeed stop a global extinction, this means that we should update our estimated likelihood for such an individual. In this world, there may better alternatives than focusing on something specific to current biological techniques and instead to put effort into methods which can help against any individual that wants to cause massive harm by any technology.
This calls to mind Steven Pinker's chapter on X-risk in Enlightenment Now. He argues that very few people in recent history have conducted individual terrorism, despite it being very easy to do so (one person with poison in one pharmacy could spark a $100 million drug recall, one person with a car in one city could kill dozens with little to no planning). As a result, he sees X-risk from individual actions as a relatively low priority, compared to other problems that could be caused by societal shifts (e.g. from a more enlightened world to a more postmodern or populist world).
I'd want to answer a lot more questions about people before I start agreeing with Pinker (for example, how many copycats might follow in the footsteps of the first successful terrorist of type X?). But I do think it's a good question to think about; if very few people are malicious in this sense, maybe we should put a relatively high priority on disasters caused by poor incentives or uncontrolled technology, rather than technology controlled by bad individuals or groups.
I was following until the last word. You've put forward some of Pinker's interesting arguments about not worrying much about individuals, but what are the arguments against worrying about "bad" groups?
Thanks for the question; I should have been more clear. By "groups" I mean small groups of people without specialized knowledge. In Pinker's model, a cell of five malicious people working together isn't much more dangerous than a single malicious person. Historically, people willing to sacrifice themselves to disrupt society haven't been very common or competent, so threats on the level of "what a few untrained people can do" haven't accounted for much damage, compared to threats from nations and from civilization itself.
This changes if a malicious person/small group has specialized experience (e.g. someone building a virus in their basement), but the lower the base rate of individual malice, the lower the chance that someone who gains this expertise will want to use it to hurt people, and the lower the chance that such people will find each other and form a group.
Examples of a few "categories" of entity that might be dangerous:
If Pinker is right that very few people want to cause as much harm as possible, we'd worry less about malicious people, whether alone or together, and worry more about threats caused by people who don't want to cause harm but have bad incentives, whether because of profit-seeking, patriotism, or other norms that aren't utilitarian. At least, that's my interpretation of the chapter.
I've been reading Phil Torres's book on existential risks and I agree with him to the extent that people have been too dismissive about the amount of omnicidal agents or their capability to destroy the world. I think his reaction to Pinker would be that the level of competence needed to create disruption is decreasing because of technological development. Therefore, historical precedent is not a great guide. See: Who would destroy the world? Omnicidal agents and related phenomena
Abstract:
I don't know that I agree with Pinker; even if he's right about the low base rate, ideas that reassure us about the limited impact of people with guns and poison may not extend to omnicidal attacks. I'm still much more worried about skilled groups of people working within corporations and governments, but I assume that our threat profile will shift more toward individuals over time.
This also seems reminiscent of Bostrom's Vulnerable World Hypothesis (published a year after this thread, so fair enough that it didn't make an appearance here :D). The abstract:
The most relevant part is Bostrom's "easy nukes" thought experiment.
Dear all, thanks for starting this thread, this is one of the most worrying problems that i have been pondering about for the past few years.
1. I believe that although empirically speaking, Pinker is probably right to say that individuals would be less likely to cause harm as much as possible to the world and that the logical conclusion would be that we focus more effort to counter malicious group. However, i believe that a unskilled single individual with the highest concentration of capacity (known to the world as we know it) has even more potential to have the intensity characteristic of the x-risk event that a group or a nation of individuals could be.
2. My own belief that the world is static in condition, and that violence will continue on a steady decline trend unless intervened with, as Pleasure is always harder to generate than pain and that people can eventually have the incentive to cause pain to others to generate pleasures ("utility") to themselves.
My thoughts on the dilemma:
I think it's always good to have a better estimate of the likelihood of the x-risk presented by individuals, but i wish to think that we should always have developed enough intensity to deal with the higher potential of x-risk event. I.e, if the nuclear switches (when triggered), will cause an x-risk event, will we have developed enough intensity (advanced technology or preventive measures) to stop that occurrence then?
Thank you all very much, it's been a highly pleasurable and very thoughtful read
Wei Lun
It is an interesting idea. If I remember correctly, something slightly similar was explored in the context of GCR by Seth Baum. The case there: if the recovery time after a global catastrophic event is relatively short (compared to background extinction rate), a violent civilization destroying itself before reaching technology allowing it to make whole species go extinct may be a better outcome.
As a quick guess, I'd predict careful analysis of this style of considerations will lead to more emphasis on risks from AI and less on nuclear. Within AI safety it would likely lead to less emphasis on approaches which run the risk of creating "almost aligned" AI - the adjustments would be somewhat similar to what the negative utilitarian camp says, just much less extreme.
I'm slightly skeptical studying this consideration in more depth than a few days is marginally effective. The reason is the study will run into questions like "should we expect a society recovering after near-extinction due to bio- attack to be in better or worse position to develop grand futures?" or doing some sort of estimates on how future where AI risks were just narrowly avoided could look like.
I thought this post was particularly cool because it seems to be applicable to lots of things, at least in theory (I have some concerns in practice). I'm curious about further reviews of this post.
I find myself using the reasoning described in the post in a bunch of places related to the prioritization of longtermist interventions. At the same time, I'm not sure I ever get any useful conclusions out of it. This might be because the area of application (predicting the impact of new technologies in the medium-term future) is particularly challenging. (One could argue that there's a "Which world gets saved" argument against "Which world gets saved" arguments: In worlds where the consideration is useful, we start out with wide uncertainty. Having wide uncertainty is bad for having lots of impact.)
I wonder if there's something a bit fishy about imagining only/exactly the worlds where one's contribution makes a pivotal difference (the post hints at this with the discussion of evidential decision procedures).
I wonder if the reasoning in this post can be used to strengthen particular combinations of interventions. E.g., "AI alignment research is particularly impactful in worlds where alignment is neither too hard nor too easy. In those worlds, there will be people who would build misaligned AIs if it weren't for our inputs. Accordingly, AI alignment research becomes particularly important if we combine it with raising awareness of safety mindset and building leading coalitions of safety-minded research teams."
Very interesting! This strikes me as a particular type of mission hedging, right?
Thanks! And cool, I hadn’t thought of that connection, but it makes sense—we want our x-risk reduction “investments” to pay off more in the worlds where they’ll be more valuable.
Great points, Trammell! Thank you for this post.
Your example comparing the peaceful-psychology hypothesis and the violent-psychology hypothesis is effective, and it stands on its own. However, I don't think it's the best way to represent Steven Pinker's argument, and I think representing that argument more accurately leads in some interesting new directions.
As I understand him, Pinker does not argue humans have a peaceful psychology. Rather, he acknowledges that there are many aspects of our psychology that predispose us to violence, and he attributes the historical decline in violence primarily to social developments, e.g., establishing trade networks, accumulating social norms, elevating women's power in society, etc. These changes in society have built stronger and stronger defenses against our violent instincts, which have remained relatively unchanged over the course of human history.
Stating the hypothesis this way raises the question of how far we expect this trend to continue. We would be much more interested in saving a world ("World 1") where society continues to grow increasingly less violent and more compassionate, and substantially less interested in saving a world ("World 2") where that trend stops, or where it doesn't go far enough fast enough to prevent huge new sources of suffering.
Moral circle expansion is a strategy to make our world more like the first one and less like the second one. Unlike the strategies discussed in this post, it doesn't deal with affecting the likelihood of extinction scenarios. Rather, it tries to directly influence the speed and direction of the trends that determine the expected value of the long-term future (a.k.a. it tries to shift us to a world more worth saving). For what it's worth, I think Sentience Institute is doing some really valuable research on this topic.
Thanks!
Just to be clear: my rough simplification of the "Pinker hypothesis" isn't that people have an all-around-peaceful psychology. It is, as you say, a hypothesis about how far we expect recent trends toward peace to continue. And in particular, it's the hypothesis that there's no hard lower bound to the "violence level" we can reach, so that, as we make technological and social progress, we will ultimately approach a state of being perfectly peaceful. The alternative hypothesis I'm contrasting this with is a future in which can we can only ever get things down to, say, one world war per century. If the former hypothesis isn't actually Pinker's, then my sincere apologies! But I really just mean to outline two hypotheses one might be uncertain between, in order to illustrate the qualitative point about the conditional value of the future.
That said, I certainly agree that moral circle expansion seems like a good thing to do, for making the world better conditional on survival, without running the risk of "saving a bad world". And I'm excited by Sentience's work on it. Also, I think it might have the benefit of lowering x-risk in the long run (if it really succeeds, we'll have fewer wars and such). And, come to think of it, it has the nice feature that, since it will only lower x-risk if it succeeds in other ways, it disproportionately saves "good worlds" in the end.
Closely related, and also important, is the question of "which world gets precluded". Different possibilities include:
What will the rankings be like, if we sort the four precluded worlds in decreasing order of badness? I'm highly unsure either, but I would guess something like 4>2>3>1 (larger means worse).
Does it? It seems a lot of risk comes from accidental catastrophes rather than intentional ones. Accidental catastrophes to me don't seem like proof of the future being violent.
Also, I think that we should treat our efforts to reduce the risk of intentional catastrophe or inflicted suffering as evidence. Why wouldn't the fact that we choose to reduce the impact of malicious actors be proof that malicious actors' impact will be curtailed by other actors in the future?
As long as any of NTI's effort is directed against intentional catastrophes, they're still saving violent-psychology worlds disproportionately, so in principle this could swing the balance. That said, good point: much of their work should reduce the risk of accidental catastrophes as well, so maybe there's not actually much difference between NTI and asteroid deflection.
(I won't take a stand here about what counts as evidence for what, for fear that this will turn into a big decision theory debate :) )
Very interesting post.
But it seems to me that this argument assumes a relatively stable, universal, and fixed "human nature", and that that's a quite questionable assumption.
For example, the fact that a person was going to start a nuclear war that would've wiped out humanity may not give much evidence about how people tend to behave if in reality behaviours are quite influenced by situations. Nor would it give much evidence about how people in general tend to behave if behaviours vary substantially between different people. Even if behavioural patterns are quite stable and universal, if they're at least quite manipulable then the fact that person would've started that war only gives strong evidence about current behavioural tendencies, not what we're stuck with in the long term. (I believe this is somewhat similar to Cameron_Meyer_Shorb's point.)
Under any of those conditions, the fact that person would've started that war provides little evidence about typical human behavioural patterns in the long term, and thus little evidence about the potential value of the long-term.
I suspect that there's at least some substantial stability and universality to human behaviours. But on the other hand there's certainly evidence that situational factors often important and that different people vary substantially (https://www.ncbi.nlm.nih.gov/pubmed/20550733).
Personally, I suspect the most important factor is how manipulable human behavioural patterns are. The article cited above seems to show a huge degree to which "cultural" factors influence many behavioural patterns, even things we might assume are extremely basic or biologically determined like susceptibility to optical illusions. And such cultural factors typically aren't even purposeful interventions, let alone scientific ones.
It's of course true that a lot of scientific efforts to change behaviours fail, and even when they succeed they typically don't succeed for everyone. But some things have worked on average. And the social sciences working on behavioural change are very young in the scheme of things, and their methods and theories are continually improving (especially after the replication crisis).
Thus, it seems very plausible to me that even within decade we could develop very successful methods of tempering violent inclinations, and that in centuries far more could be done. And that's all just focusing on our "software" - efforts focusing on our biology itself could conceivably accomplish far more radical changes. That, of course, if we don't wipe ourselves out before this can be done.
I recently heard someone on the 80,000 Hours podcast (can't remember who or which episode, sorry) discussing the idea that we may not yet be ready, in terms of our "maturity" or wisdom, for some of the technologies that seem to be around the corner. They gave the analogy that we might trust a child to have scissors but not an assault rifle. (That's a rough paraphrasing.)
So I think there's something to your argument, but I'd also worry that weighting it too heavily would be somewhat akin to letting the child keep the gun based on the logic that, if something goes wrong, that shows the child would've always been reckless anyway.
Thanks!
This all strikes me as a good argument against putting much stock in the particular application I sketch out; maybe preventing a near-term nuclear war doesn't actually bode so badly for the subsequent future, because "human nature" is so malleable.
Just to be clear, though: I only brought up that example in order to illustrate the more general point about the conditional value of the future potentially depending on whether we have marginally averted some x-risk. The dependency could be mediated by one's beliefs about human psychology, but it could also be mediated by one's beliefs about technological development or many other things.
I was also using the nuclear war example just to illustrate my argument. You could substitute in any other catastrophe/extinction event caused by violent actions of humans. Again, the same idea that "human nature" is variable and (most importantly) malleable would suggest that the potential for this extinction event provides relatively little evidence about the value of the long-term. And I think the same would go for anything else determined by other aspects of human psychology, such as short-sightedness rather than violence (e.g., ignoring consequences of AI advancement or carbon emissions), because again that wouldn't show we're irredeemably short-sighted.
Your mention of "one's beliefs about technological development" does make me realise I'd focused only on what the potential for an extinction event might reveal about human psychology, not what it might reveal about other things. But most relevant other things that come to mind seem to me like they'd collapse back to human psychology, and thus my argument would still apply in just somewhat modified form. (I'm open to hearing suggestions of things that wouldn't, though.)
For example, the laws of physics seem to me likely to determine the limits of technological development, but not whether it's tendency to be "good" or "bad". That seems much more up to us and our psychology, and thus it's a tendency that could change if we change ourselves. Same goes for things like whether institutions are typically effective; that isn't a fixed property of the world, but rather a result of our psychology (as well as our history, current circumstances, etc.), and thus changeable, especially over very long time scales.
The main way I can imagine I could be wrong is if we do turn out to be essentially unable to substantially shift human psychology. But it seems to me extremely unlikely that that'd be the case over a long time scale and if we're willing to do things like changing our biology if necessary (and obviously with great caution).
How are the mentioned first and second objections distinct?
"Should we incorporate the fact of our own choice to pursue x-risk reduction itself into our estimate of the expected value of the future, as recommended by evidential decision theory, or should we exclude it, as recommended by causal?"
I fail to get the meaning. Could anybody reword this for me?
"The consideration is that, even if we think the value of the future is positive and large, the value of the future conditional on the fact that we marginally averted a given x-risk may not be."
Not sure I get this. Is a civilisation stalling irrevocably into chaos after narrowly surviving a pandemic a central example of this?
About the two objections: What I'm saying is that, as far as I can tell, the first common longtermist objection to working on x-risk reduction is that it's actually bad, because future human civilization is of negative expected value. The second is that, even if it is good to reduce x-risk, the resources spent doing that could better be used to effect a trajectory change. Perhaps the resources needed to reduce x-risk by (say) 0.001% could instead improve the future by (say) 0.002% conditional on survival.
About the decision theory thing: You might think (a) that the act of saving the world will in expectation cause more harm than good, in some context, but also (b) that, upon observing yourself engaged in the x-risk-reduction act, you would learn something about the world which correlates positively with your subjective expectation of the value of the future conditional on survival. In such cases, EDT would recommend the act, but CDT would not. If you're familiar with this decision theory stuff, this is just a generic application of it; there's nothing too profound going on here.
About the main thing: It sounds like you're pointing out that stocking bunkers full of canned beans, say, would "save the world" only after most of it has already been bombed to pieces, and in that event the subsequent future couldn't be expected to go so well anyway. This is definitely an example of the point I'm trying to make--it's an extreme case of "the expected value of the future not equaling the expected value of the future conditional on the fact that we marginally averted a given x-risk"--but I don't think it's the most general illustration. What I'm saying is that an attempt to save the world even by preventing it from being bombed to pieces doesn't do as much good as you might think, because your prevention effort only saves the world if it turns that there would have been the nuclear disaster but for your efforts. If it turns out (even assuming that we will never find out) that your effort is what saved us all from nuclear annihilation, that means we probably live in a world that is more prone to nuclear annihilation than we otherwise would have thought. And that, in turn, doesn't bode well for the future.
Does any of that make things clearer?
Dear all,
For example: There are things we currently do not know about human psychology, some of which bear on how inclined we are toward peace and cooperation. Perhaps Steven Pinker is right, and violence will continue its steady decline, until one evening sees the world's last bar fight and humanity is at peace forever after.
Once we start thinking along these lines, we open various cans of worms. If our x-risk reduction effort starts far "upstream", e.g. with an effort to make people more cooperative and peace-loving in general, to what extent should we take the success of the intermediate steps (which must succeed for the x-risk reduction effort to succeed) as evidence that the saved world would go on to a great future? Should we incorporate the fact of our own choice to pursue x-risk reduction itself into our estimate of the expected value of the future, as recommended by evidential decision theory, or should we exclude it, as recommended by causal? How should we generate all these conditional expected values, anyway?
My beliefs is that teachings and information dissemination can only go so far to make people more cooperative and peace loving. The final limitations we face are the fundamental human conditions that we are born with, and of course, if if we set out to improve and enhance the base fundamentals attributable to our biological conditions based on a concept of positive value creation benchmarked to the fundamental human conditions since the Big Bang, i believe that these should tend to magnitudes of infinity and a near zero concept of our physical vessel will be ideal for humanity.
The way i would think of the dilemma between EDT and Casual theory would be that, we have as humans, thrive since the origin as a result of cooperation and more importantly, a love for ourselves and those around us, and that i think people generally don't die, even from having too much of a good thing.. but too much of a bad thing, on a balance of probability kills more than the former.
Thank you, i will be happy to hear all of your comments on this
Kind regards,
Wei Lun