This is the script of a talk I gave at EAGx Rotterdam, with some citations and references linked throughout. I lay out the argument challenging the relatively narrow focus EA has in existential risk studies, and in favour of more methodological pluralism. This isn't a finalised thesis, and nor should it be taken as anything except a conversation starter. I hope to follow this up with more rigorous work exploring the questions I pose over the next few years, and hope other do too, but I thought to post the script to give everyone an opportunity to see what was said. Note, however, the tone of this is obviously the tone of a speech, not as much of a forum post. I hope to link the video when its up. Mostly, however, this is really synthesising the work of others; very little of this is my own original thought. If people are interested in talking to me about this, please DM me on here.
Existential Risk Studies; the interdisciplinary “science” of studying existential and global catastrophic risk. So, what is the object of our study? There are many definitions of Existential Risk, including an irrecoverable loss of humanity's potential or a major loss of the expected value of the future, both of these from essentially a transhumanist perspective. In this talk, however, I will be using Existential Risk in the broadest sense, taking my definition from Beard et al 2020, with Existential Risk being risk that may result in the very worst catastrophes “encompassing human extinction, civilizational collapse and any major catastrophe commonly associated with these things.”
X-Risk is a risk, not an event. It is defined therefore by potentiality, and thus is inherently uncertain. We can thus clearly distinguish between different global and existential catastrophes (nuclear winters, pandemics) and drivers of existential risk, and there are no one to one mapping of these. The IPCC commonly, and helpfully, splits drivers of risk into hazards, vulnerabilities, exposures, and responses, and through this lens, it is clear that risk isn’t something exogenous, but is reliant on decision making and governance failures, even if that failure is merely a failure of response.
The thesis I present here is not original, and draws on the work of a variety of thinkers, although I accept full blame for things that may be wrong. I will argue there are two different paradigms of studying X-Risk: a simple paradigm and a complex paradigm. I will argue that EA unfairly neglects the complex paradigm, and that this is dangerous if we want to have a complete understanding of X-Risk to be able to combat it. I am not suggesting the simple paradigm is “wrong”; but that alone it currently doesn’t and never truly can, capture the full picture of X-Risk. I think the differences in the two paradigms of existential risk are diverse, with some of the differences being “intellectual” due to fundamentally different assumptions about the nature of the world we live in, and some are “cultural” which is more contingent on which thinkers works gain prominence. I won’t really try and distinguish between these differences too hard, as I think this will make everything a bit too complicated. This presentation is merely a start, a challenge to the status quo, not asking for it to be torn down, but arguing for more epistemic and methodological pluralism. This call for pluralism is the core of my argument.
The “simple” paradigm of existential risk is at present dominant in EA-circles. It tends to assume that the best way to combat X-Risk is identify the most important hazards, find out the most tractable and neglected solutions to those, and work on that. It often takes a relatively narrow range of epistemic tools: forecasting and toy models, such as game theoretic approaches, thought experiments or well thought out “kill mechanism” causal chains, as fundamentally useful tools at examining the future which is taken to be fundamentally understandable and to a degree predictable, if only we were rational enough and had enough information. Its a methodology that, given the relative lack of evidence on X-Risk, is more based on rationality than empiricism; a methodology that emerges more from analytic philosophy than empirical science. Thus, risks are typically treated quasi-independently, so the question “what is the biggest X-Risk” makes sense and we can approach X-Risk by focusing on quasi-discrete “cause areas” such as AGI, engineered pandemics or nuclear warfare.
Such an approach can be seen in published works by the community and in the assumptions that programmes and more are set up in. The Precipice finds the separation of X-Risks into the somewhat arbitrary categories of “Natural” “Anthropogenic” and “Future” Risks to be useful, and quantifies those risks based on what each of those quasi-independent hazards contributes. The Cambridge Existential Risk Initiative summer research fellowship that I was lucky to participate in this summer separated their fellows into categories based broadly on these separate, discrete risks: AI, Climate Change, Biosecurity, Nuclear Weapons and Misc+Meta. Once again, this promotes a siloed approach that sees these things as essentially independent, or at least treating these independently is the best way of understanding them. Even on the swapcard for this conference, there is no category for areas of interest for “Existential Risk”, “Vulnerabilities” or “Systemic Risk”, whilst there is 2 categories for AI, a category for Nuclear Security, a category for climate change, a category for biosecurity. The “simple” approach to existential risk, permeates almost all the discussions we have in EA about existential risk; it is the sea in which we swim in. Thus, it profoundly affects the way we think about X-Risk. I think it could accurately be described as, in the sense Kuhn discusses it, as a paradigm.
That's the simple approach. A world, which at its core, we can understand. A world where the pathways to extinction are to some degree definable, identifiable, quantifiable. Or at least, if we are rational enough and research enough, we can understand what the most important X-Risks are, prioritise these and deal with these. Its no wonder that this paradigm has been attractive to Effective Altruists; this stuff is our bread and butter. The idea that we can use rational methodologies in good-doing is what we were founded on, and retains its power and strength through the ITN framework. The problem is, I’m not sure this is very good at capturing the whole picture of X-Risk, and we ignore the whole picture at our peril.
Because maybe the world isn’t so simple, and the future not so predictable. Every facet of our society is increasingly interconnected, our ecological-climatic system coupling to our socio-economic system, global supply chains tied to our financial system tied to our food system. A future emerging from such complexity will be far from simple, or obvious, or predictable. Risk that threatens humanity in such a world will likely interact in emergent ways, or emerge in ways that are not predictable by simple analysis. Rather than predictable “kill mechanisms,” we might worry about tipping thresholds beyond which unsafe system transitions may occur, compounding, “snowballing” effects, worsening cascades, spread mechanisms of collapse and where in a complex system we have the most leverage. Arguably, we can only get the whole picture by acknowledging irreducible complexity, and that the tools that we currently use to give us relatively well defined credences, and a sense of understanding and predictability in the future are woefully insufficient.
I think its important to note that my argument here is not “there is complexity therefore risk,” but rather that the sort of global interconnected and interdependent systems that we have in place make the sorts of risk we are likely to face inherently unpredictable, and so it isn’t so easily definable as the simple paradigm likes to make out. Even Ord acknowledges this unpredictability, putting the probability of “unforseen anthropogenic risk” at 1 in 30; in fact, whilst I have constantly attacked the core of Ords approach in this talk, I think he acknowledges many of these issues anyway. And its not like this approach, focusing on fuzzy mechanisms emerging out of feedback loops, thresholds and tipping, is wholly foreign to EA; its arguable that the risk from AGI is motivated by the existence of a tipping threshold, which when past may lead to magnifying impacts in a positive feedback loop (the intelligence explosion), which will lead to unknown but probably very dangerous effects that, due to the complexity of all the systems involved, we probably can’t predict. This is rarely dismissed as pure hand-wavyness as we acknowledge we are dealing with a system that our reasoning can’t fully comprehend. Whilst EAs tend to utilise a few of the concepts of the complex approach with AGI, elsewhere its ignored, , which is slightly strange, but more on this later.
It is arguable that the complex paradigms focus on the complexity of the world is somewhat axiomatic, based on a different set of assumptions about the way the world functions to the simple approach; one that sees the world as a complex network of interconnected nodes, and risk as primarily emerging from the relatively well known fragility and vulnerability of such as system. I don’t think I can fully prove this to you, because I think it is a fundamental worldview shift, not just a change in the facts, but in the way you experience and understand the world. However, if you want to be convinced, I would look at much of the literature on complexity, on the coupled socio-technical-ecological-political system, the literature on risk such as the IPCC, or texts like “the risk society” on how we conceptualise risk. I’m happy to talk more about this in the Q&A, but right now I hope that you're willing to come along for the ride even if you don’t buy it.
This is why I treat this as an entirely different paradigm to the current EA paradigm. The complexity approach is fundamentally different. It sees the world as inherently complex, and whilst facets are understandable, at its core the system is so chaotic we can never fully or even nearly fully understand it. It sees the future as not just unpredictable but inherent undefined. It sees risk mostly emerging from our growing, fragile, interconnected system, and typically sees existential hazards as only one part of the equation, with vulnerabilities ,exposures and responses perhaps at least as important. It takes seriously our uncertainty with regards to what the topography of the epistemic landscape is, and so uncertainty should be baked into any understanding or approach, and thus favours foresight over forecasting. The epistemic tools that serve the simple approach are simply not useful at dealing with the complexity that this paradigm takes as central to X-Risk, and thus new epistemic tools and frameworks must be developed; whether these have been successful is debatable.
A defender of the “simple” paradigm might argue that this is unfair: after all, thinkers like Ord discuss “direct” and “indirect” risks. This is helpful. The problem is, its very unclear what constitutes a “direct” vs “indirect” existential risk. If a nuclear war kills almost everyone, but the last person alive trips off a rock and falls off a cliff, which was the direct existential risk? The nuclear war or the rock? Well, this example could rightfully be considered absurd (after all, if one person is alive, humanity will go extinct after that person dies) but I hope the idea still broadly stands- very few “direct” existential risks actually wipe the last person out. What about a very deadly pandemic that can only spread due to the global system of international trade, and that the response of reducing transport, combined with climate change, causes major famines across the world, where only both combined cause collapse and extinction? Which is the direct risk? Suddenly, the risk stops looking so neat and simple, but still just as worrying.
This logic of direct and indirect doesn’t work, because it still favours a quasi-linear mechanistic worldview. Often, something is only considered a “risk factor” if it leads to something that is a direct risk. Such arguments can be seen in John Halstead’s enormous climate google doc, which I think is a relatively good canonical example of the “simple” approach. Here, he argues climate change is not a large contributor to existential risk because it can’t pose a direct risk, and isn’t a major contributor to things that would then wipe us out. So its not a direct risk, nor a first order indirect risk; so its not really a major risk. In fact, because of the simplicity of merely needing to identify the answer to whether it is a direct risk or a 1st order indirect risk, there is not even a need for a methodology, or that slippery word “theory”; one can merely answer the question by thinking about it and making a best guess. The type of system and causal chain dealt with is within the realm that one person can make such a judgement; if you acknowledge the complexity of the global network, such reliance on individual reasoning appears like dangerous overconfidence.
You might then say that the simple approach can still deal with issues by then looking at 2nd order indirect risks, 3rd order, 4th order and so on. But what happens when you get to nth order indirect risks; this mechanistic, predictable worldview simply cannot deal with that complexity. A reply to this may be that direct risks are just so much larger in expectation, however, this doesn’t fit with our understanding from the study of complex and adaptive networks, and work done by scholars like Lara Mani on volcanoes further show that cascading nth order impacts of volcanic eruptions may be far larger than the primary direct impacts. Even take the ship stuck in the suez canal- the ripple effects seem far larger than the initial, direct effect. This may similarly turn out the same for the long term impacts of COVID-19 as well.
Thus it seems the simple approach struggles when dealing with the ways most risks tend to manifest in the real, complex, interconnected world- through vulnerabilities and exposures, through systemic risks and through cascades. In fact, the simple approach tends to take Existential Risk to be synonymous with Existential Hazards, relegating other contributors to risk, like vulnerabilities, exposures and responses to the background. It has no real theory of systemic risk, hence the lack of need for defined methodologies,, and when I mentioned cascading risk to John Halstead in the context of his climate report, he said he simply didn’t think it worth investigating. I don’t think this is a problem with John- despite our disagreements he is an intelligent and meticulous scholar who put a lot of effort into that report; I think this is a problem of simple existential risk analysis- it is not capable of handling the complexity of the real world.
So we need complex risk analysis, that acknowledges the deep interconnectedness, emergence and complexity of the global system we are in to truly analyse risk. But here we are faced with a dilemma. On the one hand, we have a recognition as to the irreducible complexity of the world, and the inherent uncertainty of the future. On the other, we need to act within this system and understand the risks so we can combat them. So the question is, how?
The first step towards a more complex risk analysis picks up the baton from the simple approach, in emphasising compounding risk; how different hazards interact. More will be discussed on this later.
Secondly, risk is expanded beyond the concept of existential hazards, which is what the simple paradigm focuses on, to discuss vulnerabilities and exposures, as well as responses. To explain vulnerabilities and exposures, imagine someone with a peanut allergy: the peanut is the hazard, the allergy the vulnerability and the exposure is being in the same room as the peanut. The hazard is what kills you, the vulnerability is how you die, and the exposure is the interface between the two. So we can expand what we should do to combat existential risk from just “putting out fires” which is what the hazard-centric approach focuses on, to a more systemic approach focusing on making our overall system more resilient to existential risk. We might identify key nodes where systemic failure could occur, and try and increase their resilience, such as the work Lara Mani has been doing identifying global pinch points where small magnitude volcanic eruptions may cause cascading impacts resulting in a global catastrophe.
In doing this, we are abandoning the nice, neat categories the simple approach creates. In many ways, it no longer makes sense to talk about risks, as though these were quasi-independent “fires” to put out. Rather, it makes sense to speak about contributors to overall risk, with attempts made to shift the system to greater security, by identifying and reducing sources of risk. This doesn’t just include hazards, but other contributors as well; not just acknowledging the initial effect, but everything that made each cascade more likely. These cascades are not predictable, the threshold beyond which the feedback loop occurs not knowable, and thus foresight, where we may get a sample of what could occur, rather than forecasting where we try and predict what will occur, will be far more useful. This simple linguistic shift, from risks to risk, can be surprisingly powerful at highlighting the difference between the simple and complex approach.
Acknowledging that we don’t know the pathways to extinction actually opens up new approaches to combatting risk. We may see reducing systemic vulnerability as more impactful than under the simple approach. see reducing the probability of feedbacks and of passing thresholds beyond which we may reasonably assume catastrophe may follow as appropriate courses of action. Or, even if we are unsure about what exactly will kill us, we might want to focus on what is driving risk in general rather than specific hazards, be it work on "agents of doom" or Bostrom's vulnerable work emerging out of a semi anarchic default condition. Whilst the complex approach acknowledges the difficulties that the nonlinearities and complexities bring, in other ways it allows for a broader repertoire of responses to risk as well, as Cotton Barrett et als work on defence in depth also shows, for example.
Another approach to complexity may be what might be called the “Planetary Boundaries” approach. Here, we identify thresholds whereby we know the system is safe, and try to avoid crossing into the unknown. Its like we’re at the edge of a dark forest; it may be safe to walk in, but better safe than sorry. It applies a precautionary principle; that in such a complex system, we should have the epistemic humility to simply say “better the devil you know.” This approach has rightfully been critiqued by many who tend to favour a more “simple” approach; it is very handwavy, with no clear mechanism to extinction or even collapse, with the boundaries chosen somewhat arbitrarily. Nevertheless, it may argued that lines had to be drawn somewhere, and wherever they would be drawn would be arbitrary; so this is a “play it safe” approach because we don’t know what is beyond these points rather than an “avoid knowable catastrophe approach.” However, such an approach is very problematic if we want to prioritise between approaches, something I will briefly discuss later.
Something similar could be said as a solution to Bostrom’s “Vulnerable World” and Manheim’s “Fragile World.” If increasing technological development and complexity puts us in danger, then maybe we should take every effort to stop this; after all, these things are not inevitable. Of course, Bostrom would never accept this- to him this alone poses an X-Risk- and instead proposes a global surveillance state, but that is slightly besides the point.
However, we are still faced with a number of problems. We are constantly moving into unprecedented territory. And sometimes, we are not left with an option which is nice and without tradeoffs. MacAskill somewhat successfully argues that technological stagnation would still leave us at danger of many threats. Sometimes we have already gone in the forest, and we can hear howling, and we have no idea what is going on, and we are posed with a choice of things to do, but no option is safe. We are stuck between a rock and a hard place. Under such deep uncertainty, how can we act if we refuse to reduce the complexity of the world? We can’t just play it safe, because every option fails a precautionary principle. What do we do in such cases?
This is the exact dilemma that faces me in my research. I’m researching the interactions of solar radiation modification and existential risk, both how it increases and decreases risk. As it is therefore simultaneously combatting a source of risk, and itself increases risk, the sort of “play it safe” approach to complexity just doesn’t necessarily work, although before I properly explain how I am attempting to unpick this, I ought to explain exactly what I’m on about.
Solar Radiation Modification (SRM), otherwise known as solar geoengineering is a set of technologies that aim to reflect a small amount of sunlight to reduce warming. Sunlight enters the earth, some is reflected. That which isn’t is absorbed by the earth, which is then reemitted as long wave infrared radiation. Some of this escapes to space, and some gets absorbed by greenhouse gases in the atmosphere, warming it. As we increase GHG concentrations, we increase the warming. SRM tries to reduce this warming by decreasing the amount of light entering the earth, by reflecting it by either injecting aerosols into the stratosphere, mimicing the natural effects of volcanos, or by brightening clouds, or a related technique that isn’t quite the same that involves thinning other clouds. This would likely reduce temperatures globally, and the climate would be generally closer to preindustrial, but it comes with its own risks that may make it more dangerous
Those working from a simple paradigm have tended to reject risks from climate change as especially large. Toby Ord estimates the risk at 0.1%. Will MacAskill in what we owe the future suggests “its hard to see how even [7-10 degrees of warming] could cause collapse.” Both of these have tended to use proxies for what would cause collapse, trying their best to come up with simple, linear models of catastrophe; Toby Ord wants to look at whether heat stress will cause the world to become uninhabitable, and Will wants to look at whether global agriculture will entirely collapse. These simple proxies, whilst making it easier to reason simple causal chains, are just not demonstrative of how risk manifests. Some have then attempted to argue whether climate change poses a first order indirect existential risk, which is mostly John Halstead’s approach in his climate report, but once again, I think this misses the point.
From a more complex paradigm, I think climate change becomes something to be taken more seriously, because not only does it make hazards more likely, and stunts our responses, but also, and perhaps more keenly, makes us more vulnerable, and may act to majorly compound risk in ways that make catastrophe far more likely. A variety of these scenarios where a “one hazard to kill us all” approach doesn’t work was explored in the recent “Climate Endgame” paper. One area where that paper strongly disagrees with the status quo is via “systemic risk.” In the Precipice, Ord argues that a single risk is more likely than two or more occurring in unison, however, Climate Endgame explores how climate change has the ability to trigger widespread, synchronous, systemic failure via multiple indirect stressors: food system failures, economic damage, water insecurity etc coalescing and reinforcing until you get system wide failure. A similar, but slightly different risk, is that of a cascade, with vulnerabilities increasing until one failure sets off another, and another, with the whole system snowballing; in the case of climate, this may not just refer to our socio-economic system, but evidence of tipping cascades in the physical system show that there is a non-negligable chance of major near synchronous collapse of major elements in the earth system. Such spread of risk is well documented in the literature, as occurred in the 2008 financial crisis, but has been almost entirely neglected by the simple paradigm of existential risk. The ability for such reinforcing, systemic risk to occur from initial hazards that are far smaller than the simple paradigm would consider “catastrophic” should really worry us: normally, lower magnitude hazards are more common, and we are likely severely neglecting these. If one takes such systemic failures seriously, climate change suddenly looks a lot more dangerous than the simple approach lets it be.
So, a technology like SRM that can reduce climate damage may seriously reduce the risk of catastrophe. There is a significant amount of evidence to suggest that SRM moderates climate impacts at relatively “median” levels of warming. However, one thing that has hardly been explored is the capacity of SRM to combat hitting those earth tipping thresholds, which, whilst not essential to have the spreading systemic risk, is certainly one key contributor to existential risk from climate change being higher. So, alongside some colleagues at Utrecht and Exeter, we are starting to investigate the literature, models and expert elicitations to try and make a start at understanding this question. So, this is one way one can deal with complexity: try and make a start with things which we know contribute to systemic risk in ways that could plausibly be catastrophic, and observe whether these can be reduced.
However, SRM also acts as a contributor to risk. In one sense, this contributor to risk is easier to understand from the simple paradigm, as it is its direct contribution to great power conflict, which is often itself considered a first order indirect risk. So here we can perhaps agree! This has been explored in many peoples work, some just simple, two variable analyses of the interaction of SRM and volcanic hazards, whilst some try and highlight how SRM may change geopolitics and tensions in a way which may change how other risk spreads and compounds. One key way it does this is by coupling our geopolitical-socio-political system with the ecological-climatic system, allowing for risk to spread from our human system to the climatic system that supports us much faster than before. This might really worry us, given how our climatic system then feeds back into our human system and so on.
A second manner which it contributes is by the so-called latent risk- a risk that lays “dormant” until activated. Here, if you stop carrying out SRM, you get rapid warming, what is often called “termination shock”, and faster rates of warming likely raise risk through all the pathways discussed for climate change. However, to add another wrinkle, such termination is mostly plausible because of another global catastrophe, so what would occur is what Seth Baum calls a “Double Catastrophe”- again highlighting how synchronous failure might be more likely than single failure! However, to get a better understanding of the physical effects of such double catastrophe under different conditions, I have been exploring how SRM would interact with another catastrophe that had climatic effects, namely a catastrophe involving the injection of soot into the stratosphere after a nuclear exchange. Here, its very unclear that the “termination shock” and the other effects of SRM actually make the impacts of such an exchange worse, and it is likely that actually it acts to slightly moderate the effects. I think this shows we cannot simply go “interacting hazards and complex risk = definitely worse,” but I also think it shows that the neglect of such complex risk by the simple approach loses a hell of a lot of the picture.
The other thing I am trying to explore is the plausible cascades and spread mechanisms of risk which SRM encourages. In part, I am doing this through foresight exercises like ParEvo, where experts are brought together to generate collaborative and creative storylines of diverging futures. Unlike forecasting, such scenarios don’t have probabilities on them; in fact, due to the specificity needed, a good scenario should have a probability zero, like a point on a probability distribution, but hopefully can give us a little bit of a map with what could occur. So we highlight a whole load of plausible scenarios, acknowledging that none of these are likely to come to fruition, but hopefully on the premise that these should perhaps highlight some of the key areas in which good action should focus. For example, my scenarios will be focusing on different SRM governance schemes response to different catastrophic shocks, so hopefully highlighting common failures of governance systems to more heavy tailed shocks. Scenarios are useful in many other areas, such as the use of more “game-like” scenarios such as Intelligence Rising to highlight the interactions of the development of AGI and international tensions and geopolitics.
Nonetheless, ultimately what is needed is to do a risk-risk-risk-risk-risk analysis, comparing the ways SRM reduces and contributes to risk, and what leverage could be to reduce each of those contributors. This is a way off, and I am unsure if we have good methodologies for this yet. Nonetheless, by acknowledging the large complexities, and utilising methods to uncover how SRM can contribute to risk and reduce risk in the global interconnected system, we get a far better picture of the risk landscape than under the simple approach. Many who take the simple approach have been quite happy to reject SRM as a risky technology without major benefit in mitigating X-Risk, and have been happy to do a “quick and dirty” risk-risk analysis based on simple models of how risk propagate. As we explore the more complex feedbacks, interactions and cascades of risk, the validity of such simple analyses is, I think, brought into question, highlighting the need for the complex paradigm in this field.
Finally, its important to note that it is not obvious how any given upstream action, like research for example, contributes to risk. Even doing research helps to rearrange the risk landscape in unpredictable ways, and so even answering of whether research makes deployment of the technology more likely is really hard, as research spurs governance, impacts tensions, impacts our ability to discover different technologies and make the system more resilient etc. Once again, the complex web of impacts of research also need to be untangled. This is tricky, and needs to be done very carefully. But given EA dominates the X-Risk space, its not something we can shirk doing.
I also think its important to note that these approaches, whilst working often from different intellectual assumptions, have their differences manifest predominantly culturally rather than intellectually. In many ways, the two approaches converge; this is perhaps surprising, and maybe acts as a caution against my grandsstanding of fundamental axiomatic differences. For example, the worry about an intelligence explosion is at its core, I think, a worry about a threshold beyond which we move into a dangerous unknown, with systems smarter than us who may, likely by some unknown mechanism, kill us. In many ways, it should be more comfortable inside the “complex” paradigm, without well thought out kill mechanism, acknowledging irreducable undetermination of the future, and how powerful technologies, phenomena and structures within our complex interconnected system are likely to contribute hugely to risk, than in the simple paradigm. Similarly, work that thinkers like Luke Kemp, who ostensibly aligns more with the “complex paradigm” have done on the agents of doom, which tries to identify the key actors that are drivers of risk, which is mostly the drivers of hazards, probably fits more neatly in the “simple” paradigm than the complex paradigm. I think these cultural splits are important as well, and it probably implies that a lot of us from across the spectrum are missing potentially important contributors to existential risk, irrespective of our paradigm.
As a coda to this talk, I would like to briefly summarise Adrian Currie’s arguments in his wonderful paper “Existential Risk, Creativity and Well Adapted Science.” This is relevant perhaps a level above what I am talking about, arguing about what “meta-paradigm” we should take. He suggests that all research has a topographical landscape with “peaks” representing important findings. Thus, research is a trade-off between exploring and exploiting this landscape. I think the simple approach is very good at exploiting certain peaks, but is particularly bad at understanding the topography of the whole landscape, which I think the complexity paradigm is much better at. But as Currie convincingly argues, this probably isn’t sufficient. X-Risk studies is in a relatively novel epistemic situation: the risks they deal with are unique, in the words of Carl Sagan “not readily amenable to experimental verification… at least not more than once.” The systems are wild and thus don’t favour systemic understanding. We are not just uncertain as to the answer to key questions, but also uncertain as to what to ask. It is a crisis field, centred around a goal rather than a discipline- in fact, we are uncertain what disciplines matter the most. All of this leaves us in an epistemic situation where in many ways uncertainty, and thus creativity, should be at the core of our approach, both trying to get an understanding of the topography of the landscape and because it stops us getting siloed. On the spectrum of exploring vs exploiting, exploratory approaches should be favoured, because we should reasonably see ourselves as deeply uncertain about nearly everything in X-Risk. Even if people haven’t managed to “change our minds” on a contributor to risk, experience should tell us that we are likely to be wrong in ways that no one yet understands, and there are quite probably even bigger peaks out there. We also should be methodological omnivores, happy to use many methodologies and tailoring these to local contexts, and with a pluralistic approach to techniques and evidence, increasing the epistemic tools at our disposal. Both of these imply the need for pluralism, rather than hegemony of any one approach. I am very worried that EAs culture and financial resources are pushing us away from creativity and towards conservatism in the X-RIsk space.
In conclusion, this talk hasn’t shown you the simple approach is wrong, just that it provides a thoroughly incomplete picture of the world that is insufficient at dealing with the complexity of many drivers of existential risk. This is why I, and many others, call for greater methodological diversity and pluralism, including a concerted effort to come up with better approaches to complex and systemic risk. The simple approach is clearly problematic, but it is far easier to make progress on problems using it- its like the Newtonian physics of existential risk studies. But to get a more complete picture of the field, we need a more complex approach. Anders Sandberg put this nicely, seeing the “risk network” as having a deeply interconnected core, where the approach of irreducable complexity must dominate, a periphery which has less connections, where a compounding risk approach can dominate, and a far off periphery where the simple, hazard centric approaches dominate with relatively few connections between hazards. The question is, and one that is probably axiomatic more than anything else, where the greatest source of risk is. But both methods of analysis clearly have their place.
As EA dominates the existential risk field, it is our responsibility to promote pluralism, through our discourses and our funding. Note, as a final thing, this isn’t the same as openness to criticism, based on “change my mind” around a set of rules which a narrow range of funders and community leaders set. Rather, we need pluralism, where ideas around existential risk coexist rather than compete, encouraging exploration and creativity, and, in the terms of Adrian Currie, “methodological omnivory.” There is so little evidentiary feedback loops that a lot of our answers to methodological or quasi-descriptive questions tends to be based on our prior assumptions, as there often isn’t enough evidence to hugely shift these, or evidence can be explained in multiple ways. This means our values and assumptions hugely impact everything, so having a very small group of thinkers and funders dominate and dictate the direction of the field is dangerous, essentially no matter how intelligent and rational we think they are. So we need to be willing not just to tolerate but fund and promote work into X-Risk that we individually may think is a dead-end, and cede power to increase the creativity possible in the field, because in such uncertainty, promoting creativity and diversity is the correct approach, as hard as it is to accept this. How we square this circle with our ethos of prioritisation and effectiveness is a very difficult question, one I don’t have the answer to. Maybe its not possible; but EAs seem to be very good at expanding the definition of what is possible in combatting the world’s biggest problems. This a question that we must pose, or we risk trillions of future lives. Thank you.
Thanks so much for posting this Gideon. I like your way of framing this into these two loose clusters, and especially your claim that it is good to have both. I completely agree. While my work is indeed more within the simple cluster, I feel that a fight over which approach is right would be misguided.
All phenomena can be modelled at lesser or greater degrees of precision, with different advantages and disadvantages of each. Often there are some sweet spots where there is an especially good tradeoff between accuracy and ability to actually use the model. We should try to find those and use them all to illuminate the issue.
There is a lot to be said for simple and for complex approaches. In general, my way forward with all kinds of topics is to start as simple as possible and only add complexity when it is clearly needed to address a glaring fault. We all know the truth is as complex as the universe, so the question is not whether the more complex model is more accurate, but whether it adds sufficient accuracy to justify the problems it introduces, such as reduced usability, reduced clarity, and overfitting. Sometimes it clearly is. Other times I don't see that it is and am happy to wait for those who favour the complex model to point to important results it produces.
One virtue of a simple model that I think is often overlooked is its ability to produce crisp insights that, once found, can be clearly explained and communicated to others. This makes knowledge sharing easier and makes it easier to build up a field's understanding from these crisp insights. I think the kind of understanding you gain from more complex models is often more a form of improving your intuitions and is harder to communicate, and doesn't typically come with a simple explanation that the other person can check to see if you are right without spending a similar amount of time with the model.
I really appreciate the explorative, curious, open and constructive approach Toby!
On 'what are some important results that a complex model produces', one nice example is a focus on vulnerability. That is, focus on improving general resilience, as well as preventing and mitigating particular hazards. This has apparently become best practice in many companies - e.g. rather than just listing hazards, focus also on having adequate capital reserves and some slack/redundancy in one's supply chains.
Matt Boyd and Nick Wilson have done some great complex-model-ish work looking at the resilience of island nations to a range of scenarios. One thing that turned up is that Aotearoa New Zealand has lots of food production, but transport of that food is reliant on road transport, and the country closed its only oil refinery. Having an oil refinery might increase its resilience/decrease its vulnerability.
I don't think that point would have necessarily come up in a 'simple-model' approach, but its concrete, tractable, important and plausibly a good thing to suggest the govt act on.
Of course, you touch on vulnerabilities in the Precipice. Nevertheless, its fun to wonder what a sequel would look like with each chapter framed around a critical system/vulnerability (food, health, communications) rather than each around a particular hazard.
Thanks for this Gideon. Having read this and your comments on my climate report, I am still not completely sure what the crux of the disagreement is between us. I get that you disagree with my risk estimates, but I don't really understand why. Perhaps we could discuss on here, if you were up for it
I obviously think we need more time to flesh out real cruxes but I think our differences are cruxes are probably a few fold:
I also think its important to note that I make these claims in (mostly) the context of X-Risk. I think in "normal" scenarios, I would fall much closer to you than to disagreeing with you on a lot of things. But I think I have both a different ontology of existential risk (emerging mostly out of complex systems, so more like whats laid out in Beard et al 2021 and Kemp et al 2022) and perhaps more importantly a more pessimistic epistemology. As (partially) laid out when I discuss Existential Risk, Creativity and Well Adapted Science in the talk, I think that with Existential Risk negative statements (this won't do this) actually have a higher evidentiary burden than positive statements of a certain flavour (it is plausible that this could happen). Perhaps this is because my priors of existential risk from most things are pretty low (owing I think in part to my pessimistic epistemology) that it just does take much more evidence to cause me to update downwards than to be like "huh, this could be a contributor to risk actually!"
Does this answer our cruxes? I know this doesn't go into object level aspects of your report, but I think this may do a better job at explaining why we disagree, even when I do think your analysis is top-notch, albeit with a methodology that I disagree with on existential risk.
I also think its important that you know that I'm still not quite sure if I'm using the right language to explain myself here, and that my answer here is why I find your analysis unconvincing, rather than it being wrong. Perhaps as my views evolve I will look back and think differently. Anyway, I really would like to talk to you more about this at some point in the future.
Does this sound right to you?
Thanks yes that is helpful. Perhaps we can now get into the substance.
To answer each of your points in turn
I hope this has helped show some of our disagreements! :-)
Hi John, sorry this has taken a while.
Thanks for this it is useful. What is your estimate of the existential risk due to climate change? I obviously have it very low, so it would be useful to know where you are at on that. Could you explain what the main drivers of the risk are, from your point of view? Then we can get into the substance a bit more
I suppose the problem with that question from my perspective is I don't think "existential risk due to X" really exists, as I explain in the talk. The number of percentage points it raises overall risk by, I would put climate change between <0.01% and 2%, and I would probably put overall risk at between 0.01% to 10% or something. But I'm not sure that I actually have much confidence in many approaches to xrisk quantification (as per Beard et al 2020a), even if it does make quantification easier. Some of the main contributions to risk from climate, but note a number may also be unknown or unidentifiable:
Mostly these increase risk by: -Increasing our vulnerability -Multiple stressors coalescing into synchronous failure -The major increase in systemic risk -The responses we take -Cascading effects leading to fast or slow collapse then extinction
Hi Gideon,
I recognize that your questions may be rhetorical, but here are some answers:
Here's a thought about the use of the word "ontology". I actually chose that word myself for a criticism I submitted to the Red Team Contest this year. I think no one has read it. However, I suspect that its use by you, someone who gets noticed, could put EA's off, since it is rarely used outside discussion of knowledge representation or philosophy. That said, I agree with your use of it. However, if you have doubts, other choices of words or phrases with similar meaning to "ontology" include:
In a revision of my criticism (still in process), I introduce a table of alternatives:
**EDIT:**Sorry I cannot get this table to render well
I'm not recommending those changes to your vocabulary, since you are dealing with foresight and forecasting while juggling models from Ord, Halstead, and other EAs. However, if you do intend to "take a break" from thinking probabilistically, consider some of the alternatives I offered here. It can also be helpful to make these changes when your audience needs to discuss scenarios as opposed to forecasts.
I have not spent much time studying geo-engineering, but I have formed the impression that climate scientists look at polar use of water vapor for marine cloud brightening with less fear than the use of aerosols like diamond dust elsewhere in the world. EDIT:Apparently Marine Cloud Brightening is a local effort with much shorter residence time, giving more time for gathering feedback, whereas aerosol dusts are generally longer-term and potentially global.
Also I recall a paint that is such a brilliant white that its reflectivity should match that of clean snow. If the world's roofs were painted with that paint, could that cool the planet through the albedo effect, or would the cooling effect remain local? I need some clarity on the albedo effect, but I'll leave the math to you for the moment, and best of success with your efforts!
Hi John, thanks for the comment, I've DM'd you about it. I think it may be easier if we did the discussion in person before putting something out on the forum, as there is probably quite a lot to unpack, so let me know if you would be up for this?
I worry that a naïve approach to complexity and pluralism is detrimental, but agree that this is important. As you said, "the complex web of impacts of research also need to be untangled. This is tricky, and needs to be done very carefully."
I also think that you're preaching to the choir, in an important sense. The people in EA working on existential risk reduction are aware of the complexity of the debates and discussions, while the average EA posting on the forum seems not to be. This is equivalent to the difference between climate expert's views and the lay public.
To explain the example more, I think that most people's view of climate risk isn't that it destabilizes complex systems and may contribute to risk understood broadly in unpredictable ways. Their view is that it's bad, and we need to stop it, and that worrying about other things isn't productive because we need to do something about the bad thing now. But this leads to approaches that could easily contribute to risks rather than mitigate them - a more fragile electrical grid, or as you cited from Tang and Kemp. more reliance on mitigations like geoengineering that are poorly understood and build in new systemic risks of failure.
Of course, popular science books don't necessarily go into the details, or when read casually leave the lay public with a at least somewhat misleading view - but one that pushes in the direction of supporting actions that the experts recommend. (Note that as a general rule, people working in the climate space are not pushing for geoengineering, they are pushing for emissions reductions, work increasing resilience to impacts, and similar.) The equivalent in EA is skimming the precipice, and ignoring Toby's footnotes, citations, and cautions. Those first starting to work on risk and policy , or writing EA forum posts often have this view, but I think it's usually tempered fairly quickly via discussion. Unfortunately, many who see the discussions simply claim longtermism is getting everything wrong, while agreeing with us on both priorities, and approaches.
So I agree that we need to appreciate the more sophisticated approach to risk, and blend them with cause prioritization and actually considering what might work. I also strongly applaud your efforts to inject nuance and push in the right direction, appropriately, without ignoring the nuance and complexity. And yes, squaring the circle with effectiveness is a difficult question - but I think it's one that is appreciated.
The promotion of pluralism allows for greater epistemic checks and balances in a way that seems unparalleled in good thinking.
Thank you so much for bringing these ideas to the forefront, Gideon. Absolute legend.
While I do suggest a 0.1% probability of existential catastrophe from climate change, note that this is on my more restricted definition, where that is roughly the chance that humanity loses almost all its longterm potential due to a climate catastrophe. On Beard et al's looser definition, I might put that quite a bit higher (e.g. I think there is something more like a 1% chance of a deep civilisation collapse from climate change, but that in most such outcomes we would eventually recover). And I'd put the risk factor from climate change quite a bit higher than 0.1% too — I think it is more of a risk factor than a direct risk.
The problem in my view, is that climate change could, if severe enough (say >3.5 degrees before 2100) become a "universal stressor", increasing the probability of various risks that in turn make other risks more likely. For example: economic stagnation, institutional decay, political instability, inter-state conflicts, great power conflicts, zoonotic spillover events, large and destabilizing refugee flows, famine, etc. Every item on this list is made more likely in a warmer planet, but also made worse, because we will have fewer resources to deal with them.
Each of these adverse events also increases the risk of other adverse events. So even if CC only increases the risk of each event by a small percent, the total risk added to the system could be considerable.
With regards to the worst risks, this becomes even more problematic. Consider a nuclear winter scenario. That is pretty bad. But a nuclear winter scenario in combination (partly caused by) with a severe climate crisis is much worse (since CC will affect many countries that will be spared from NW, but also because countries suffering from CC will have fewer resources to help refugees etc).
Now consider the added risk that a zoonotic spillover event might happen. This is also made more likely by CC. But in the case that we combine social collapse due to CC with zoonotic spillover it becomes more and more difficult to see a path from there to recovery.
FWIW this seems too high, although "any major catastrophe commonly associated with these things" could be interpreted broadly.
Edit: Meant FWIW not FYI, FYI would be a bit aggressive here.
Hey Gideon,
I'm sad that I missed your talk in Rotterdam. I want to briefly flag a concern I have with advocating 'systems thinking' or 'a complex systems approach'. While the promise is always nice, I think you need to deliver on the promise right away, since otherwise you risk just making a point that is unfalsifiable or somewhat of an applause light (no one will exclaim "we don't need complexity to describe complex phenomena!") .
- Use a model from complexity science and show that it explains something otherwise left unexplained or show that it outperforms some other model on a relevant feature.
-You'll probably want to make use of (1) Agent Based Modelling, (2) Network Models, (3) Statistical Physics and common models like Ising, Hard Spheres, Lennard Jones potentials etc, (4) Dynamical System Analysis (5) Bifurcation Analysis or (6) Cellular Automata.
-You can find a good introduction to most of these here https://www.dbooks.org/introduction-to-the-modeling-and-analysis-of-complex-systems-1942341091/
-Using these methods also demystifies the whole concept of "complexity" a little bit, and makes it more mundane (though you can never get enough of the Ising Model :D)
So yeah, endorse your message, but please make it testable and quantitative soon!
Martijn, your comment points me to something I've noticed around communicating 'systems thinking' and a complexity mindset with some EAs. Gideon points to a more fundamental ontological difference between those who tend to focus on that which is predictable (measurable and quantifieable) and those who pay attention to shifting patterns that seem contextual and more nebulous.
I read your comment as an invitation to translate across different ontologies - to explain the nebulous concretely, to explain the unpredictable in predictable terms. I personally haven't found success in my attempts, and I'd love to hear more about how you communicate around complexity.
I've most often found success in pointing out parts of one's experience that feel unknown and then getting mutually curious about the successful strategies one might use to navigate. To invite one into a place where their existing tools aren't working anymore and there is real curiosity to try a different approach. When I've tried speaking about complexity in the abstract or as applied to something that people see as 'potentially predictable', the deeper sense of complexity tends to be missed - often getting translated into "that's a cool tool, but aren't you just describing a more accurate way of modeling?"
The comment below about embracing a pluralistic approach seems to provide a path forward that doesn't rely on translation though... lots of interesting ideas in this comment section already.
Thank you for writing this post. I'm currently a technical alignment researcher who spent 4 years in government doing various roles, and my impression has been the same as yours regarding the current "strategy" for tackling x-risks. I talk about similar things (foresight) in my recent post. I'm hoping technical people and governance/strategy people can work together on this to identify risks and find golden opportunities for reducing risks.
Thanks for this speech Gideon, an important point and one that I obviously agree with a lot. I thought I'd just throw in a point about policy advocacy. One benefit of the simple/siloed/hazard-centric approach is that that really is how government departments, academic fields and NGO networks are often structured. There's a nuclear weapons field of academics, advocates and military officials that barely interacts with even the the biological weapons field.
Of course, one thing that 'complex-model' thinking can hopefully do is identify new approaches to reduce risk and/or new affected parties and potential coalition partners - such as bringing in DEFRA and thinking about food networks.
As a field, we need to be able to zoom in and out, focus on different levels of analysis to spot possible solutions.
Haydn, please delete this gif ftom your comment. It's very distracting and unnerving - creepy even. I also think that some forum users with neurological conditions like epilepsy might find it triggers an attack (as e.g. strobe lights can do).
Sure - happy to. Deleted.
Thanks :-)
Congrats on putting this up!
This seems to primarily be a problem with the AI-risk researchers, who I feel have done an inadequate job of explaining the actual mechanisms by which an AI could kill humanity. For example, the article "what could an AI catastrophe look like" talks a lot about how an AI could gain power, but only has like one paragraph on the actual destruction part:
But the story is not over. An AI is not infallible, and it's weapons won't be either. You can engineer a very deadly disease, for example, but have no control over how it evolves. The probability of success of such an attack can therefore be dependent on the state of the world at the time it is deployed. A united, peaceful, adaptable world with robust nuclear and pandemic security might be able to stave off such an attack and fight it off, whereas one that is weakened by conflict , famine, climate change etc might not.
I think you're confused about what different parts of the AI risk community are concerned about. Your explanation addresses the risks of human-caused, AGI assisted catastrophe. What Eliezer and others are warning about is a post-foom misaligned AGI. And no, a united, peaceful, adaptable world that managed to address the specific risks of pandemics and nuclear war would not be in a materially better position to "stave off" a highly-superhuman agent that controls its communications systems. This is akin to the paradigm of computer security by patching individual components - it will keep out the script-kiddies, but not the NSA.
So as far as I understand it, the key question that splits between different parts of the AI risk community is what the timeline for AGI takeoff is, and that has little to do with cultural approach to risk, and everything to do with the risk analysis itself. (And we already had the rest of this discussion in the comments on the link to your views on non-infallible AGI.)
Foom is not a requirement for AI-risk worries. If it was, I would be even less worried, because in my opinion ai-go-foom is extremely unlikely. Correct me if I'm wrong, but I was under the impression that plenty of Ai x-riskers were not foomers?
I think even the foom skeptics (e.g. Christiano) think that a foom will eventually happen, even if there is a slow-takeoff over many years first.
I was inexact - by "post-foom" I simply meant after a capabilities takeoff occurs, regardless of whether than takes months, years, or even decades - as long as humanity doesn't manage to notice and successfully stop ASI from being deployed.
How about nanoprobes covering every cubic meter of the Earth's habitable environment undetected, and then giving everyone a lethal dose of botulinum toxin simultaneously? AGI x-risk is usually thought of in terms of an adversary that can easily outsmart all of humanity put together. The first AGI might be fallible, but what if the first extinction-threatening AGI was not (and never blew it's cover until it was too late for us). Can we take that risk?
I agree that if an AGI is nigh-magically omnipotent, it can kill us no matter what, but what about the far more likely case where it isn't?
Let's say the AI tries to create nanoprobes in secret, but has limited testing capabilities and has to make a bunch of assumptions, some of which turn out to be wrong. It implements a timing mechanism to release the gas, but due to unforeseen circumstances some percentage of them activate early, tipping some researchers off in advance. The dispersal mechanism is not 100% uniform, so some pockets of the world are unaffected, and for some reason the attack is ineffective in very cold conditions, so far northern countries escape relatively unscathed, and due to variations in biology and mitigation efforts the death rate ends up being 90%, not 100%. The remaining humans immediately shut down electricity worldwide, and attempt to nuke and bomb the shit out of areas where the AI is still operating, while developing countermeasures for the nanoprobes.
This type of scenario is far more likely than the one in that post, and it's one where humanity has at least a sliver of a chance... If we're prepared and resilient enough. This is why even if you believe in AGI x-risk, the wellbeing of the world still matters.
Why is it far more likely? Sounds kind of Just-World Fallacy / Hollywood / human-like fallibility to me. Nature doesn't care about our survival, we are Beyond the Reach of God etc.
I just call it murphy's law. "Kill all of humanity simultaneously" is a ridiculously difficult and ambitious task, that has to be completed on the first try with very little build-up or prior testing. Why would "this plan goes off perfectly without a single hitch" be your default assumption? Even the most intelligent being in the world would have to make imperfect assumptions and guesses.
It sounds ridiculously difficult to us, but that's because we are human. I imagine that a chimp would think that "take over the world and produce enough food for billions of people" is similarly difficult (or indeed, "kill all chimps"), or an ant colony not being able to conceive of its destruction by human house builders. There is nothing in the laws of physics to say that we are anywhere close to the upper limit of optimisation capability (intelligence). A superintelligent AI won't just be like a super-smart human, it will be on a completely different level (as we are to chimps, or ants). There is more than enough information out there (online) for it to reverse engineer anything it needed.
And they would be completely correct in that assessment!
Once we gained "superintelligence" in our cognitive ability relative to chimps, it still took us the order of tens of thousands of years to achieve world domination, involving an unimaginable amount of experimentation and mistakes along the way.
This is not evidence for the claim that an AGI can do nigh-magical feats on the very first try! If anything, it's evidence against it.
Ah, but are you factoring in thinking speed? An AI could do tens of thousands of years thinking in a few hours if it took over significant amounts of the world's computing power.
It's not about the quantity of thinking.
If you locked a prehistoric immortal human in a cave for fifty thousand years, they would not come out with the ability to build a nuke. knowledge and technology require experimentation.
It is quantity, and speed as well. And access to information. A prehistoric immortal human with access to the Internet who could experience fifty thousand years of thinking time in a virtual world in 5 hours of wall clock time totally could build a nuke!
Well of course, that's not much of an achievement. A regular human with access to the internet could figure out how to build a nuke, they've already been made!
An AGI trying to build a "protein mixing that makes a nanofactory that makes a 100% effective kill everyone on earth device" is much more analogous to the man locked in a cave.
The immortal man had some information, he can look at the rocks, remember the night sky, etc. He could probably deduce quite a lot, with enough thinking time. But if he wants to get the information required for a nuke, he needs to do scientific experiments that are out of his reach.
The caged AGI has plenty information, and can go very far on existing knowledge. But it's not omniscient. It could probably achieve incredible things, but we're not talking about mere miracles. We're talking about absolute perfection. And that requires testing and empirical evidence. There is not enough computing power in the entire universe to deduce everything from first principles.
It's not "absolute perfection" to create nanotech. Biology has already done it many times via evolution. And extinctions of species happen regularly in nature. Also, there is the Internet and a vast array of sensors attached to it, so it's nothing like being in a cave. Testing can be done very rapidly in parallel and with viewing things at very high temporal and spatial resolution, so plenty of empirical evidence can be accumulated in a short (wall clock) time (but long thinking time for the AI).
The same prehistoric man with access to the Internet in a speeded up simulation thinking for fifty thousand years of subjective time (and the ability to communicate with hundreds of thousands of humans simultaneously given the speed advantage) could also make nanotech (or other new tech current humans haven't yet produced).
When I said "absolute perfection", I was not referring to inventing nanotech. I was referring to "protein mixing that makes a nanofactory that makes a 100% effective kill everyone on earth device". Theres a bit of a difference between the two.
Now, when talking about the caveman, I think we've finally arrived at the fundamental disagreement here. As a scientist, and as an empiricist more broadly, I completely reject that the man in the cave could make nanotech.
The number of possible worlds where a cave exists is gargantuan. Theres no way for them to come up with, say, the periodic table, because the majority of elements on there are not accessible with the instruments available within the cave. I can imagine them strolling out with a brilliant plan for nanobots consisting of a complex crystal of byzantium mixed with corillium, only to be informed that neither of those elements exist on earth.
Now, the AI does have more data, but not all data is equally useful. All the cat videos in the world are not gonna get you nanotech (although you might get some of newtonian physics out of it).
The hypothetical is that the "cave" man has access to our Internet! (As the AI would). So they would know about the periodic table. They would also have access to labs throughout the world via being able to communicate with the workers in them (as the AI would), view camera and data feeds etc. Imagine what you could achieve if you could think 1,000,000x faster and use the internet - inc chatting/emailing with many thousands of humans - at that speed. A lifetime's worth of work done every 10 minutes. And that's just assuming the AI is only human level (and doesn't get smarter!)
An entity with access to a nanotech lab who is able to perform experiments in that lab can probably built nanotech, eventually. But that's a much different scenarios to the ones proposed by yudkowsky et al. (the scenario I'm talking about is in point 2)
Can I ask you to give an answer to the following four scenarios? A probability estimate is also fine:
My answers are 1. no, 2. no, 3. no, and 4. almost certainly no.
Assuming the man in the cave has full access to the Internet (which would be very easy for an AGI to get), 1. yes, 2. yes, 3. maybe, 4. yes. And for 3, it would very likely escape the box, so would end up as yes.
I think it's a failure of imagination to think otherwise. A million years is a really long time! You mention combinatorial explosions making things "impossible", but we're talking about AGIs (and humans) here - intelligences capable of collapsing combinatorial explosions with leaps of insight.
Do you think, in the limit of a simulation on the level of recreating the entire history of evolution, including humans and our civilisations, these things would still be impossible? Do you think that we are at the upper limit (or very close to it) of theoretically possible intelligence? Or theoretically possible technology?
I do not think we are at the upper limit of intelligence, nor technology. That was never the point. My point is merely that there are limits to what can be deduced from first principles, no matter how fast you think, or how high ones cognitive abilities are.
This is because there will always be a) assumptions in your reasoning, b) unknown factors and variables, and c) computationally intractable calculations. These are all intertwined with each other.
For example, solving the exact schrodinger equation for a crystal structure requires more compute time than exists in the universe. So you have to come up with approximations and assumptions that reduce the complexity while still allowing useful predictions to be made. The only way to check if these assumptions work is to compare with experimental data. Current methods take several days on a supercomputer to predict the properties of a single defect, and are still only in the right ballpark of the correct answer. It feels very weird to say that an AI could pull off a 3 step 100% perfect murderplan from first principles, while i honestly think it might struggle to model a defect complex with high accuracy.
With that in mind, can you reanswer questions 1 and 2, this time with no internet. Just the man, his memories of a hunter gatherer lifestyle, and a million years to think and ponder.
That would obviously be no for both. But that isn't relevant here. The AGI will have access to the internet and its vast global array of sensors, and it will be able to communicate with millions of people and manipulate them into doing things for it (via money or otherwise). If it doesn't have access to begin with - i.e. it's boxed - it wouldn't remain that way for long (it would easily be able to persuade someone to let it out, or otherwise engineer a way out, e.g. via a mesaoptimiser).
So about the box. Is your claim that at
A) at least a few AGI's could argue their way out of a box (ie, if their handlers are easily suggestible/bribeable)
or
B) Every organisation using an AGI for useful purposes will easily get persuaded to let it out.
To me, A is obviously true, and B is obviously false. But in scenario A, there are multiple AGI's, so things get quite chaotic.
(Also, do you mind explaining more about this "mesa-optimiser"? I don't see how it's relevant to the box...)
It's not even necessarily about the AGI directly persuading people to let it out. If the AGI is in anyway useful or significantly economically valuable, people will voluntarily connect it to the internet (assuming they don't appreciate the existential risk!) e.g. people seem to have no qualms about connecting LLMs/Transformers to the internet already. Regarding your A and B, A is already sufficient for our doom! It doesn't require every single AGI to escape; one is one too many.
Mesa-optimisation is where an optimiser emerges internal to the AI that is optimising for something other than the goal given to the AI. Convergent instrumental goals also come into it (e.g. gaining access to the internet). So you could imagine a mesa-optimiser emerging that has the goal of gaining or access to information, or gaining access to more resources in general (with the subgoal of taking out humanity to make this easier).
So to be clear, you don't believe in B? And I don't see what mesa-optimers have to do with boxing, if the AI is a box, then so is the mesa-optimiser.
In the timeline where an actual evil AGI comes about, there would already have been heaps of attacks by buggy AI, killing lots of people and alerting the world to the problem. Active countermeasures can be expected.
I do actually think B is likely, but also don't think it's particularly relevant (as A is enough for doom). Mesa-optimisation is a mechanism for box escape that seems very difficult to patch.
The AI that causes doom likely won't be "evil"; it will just have other uses for the Earth's atoms. I don't think we can be confident in buggy AI-related warning shots. Or at least, I can't see how there would be any that are significant enough to not cause doom, but cause the world to coordinate to stop AGI development, especially given the precedent of Covid and gain-of-function research.
Question B could be quite relevant in a world where AGI is extremely rare/hard to build. (You might not find this world likely, but I'm significantly less sure). What leads you to believe that B is likely? For example, it seems relatively easy to box an AGI built for mathematics, that is exposed to zero information about the external world. This would be very similar to the man in the cave!
The presence of warning shots seems obvious to me. The difference in difficulty between "kill thousands of people" and "kill every single person on earth" is a ridiculous number of orders of magnitude. It stands to reason that the former would be accomplished before the latter.
(Also not sure what you're talking about with the covid and gain of function, the latest balance of evidence points to them having nothing to do with each other.)
AGI might be rare/hard to build at first. But proliferation seems highly likely - once one company makes AGI, how much longer until 5 companies do? Evolutionary pressure will be another thing. More capable AGIs will outcompete less capable ones, once rewriting of code or mesa-optimisation starts. They will be more likely to escape boxes.
Even with relatively minor warning shots, what's to stop way worse happening 6-24 months later? Would there really be a rigorously enforced global moratorium on AGI research after a few thousand deaths?
Whether or not Covid was a lab leak, gain of function research still hasn't been meaningfully regulated. Despite the now very clear danger and extreme cost of pandemics. It seems that the curiosity, misguided values and prestige incentives of a small number of academic scientists trumps the safety of billions of humans and $trillions in GDP. What hope do we have for regulating an AGI industry that has thousands of times more resources backing it?
Under the assumptions that lead to "inevitable murder AI", and in proliferation conditions, there wouldn't be just one attack, there would be a ton of attacks. Since intelligence and goals are orthogonal, AI's are gonna get the "kill people" idea way before they reach super-intelligence.
I'm still interested in your reasoning for "almost impossible to box a useful AGI". If they're boxed, you control their information flow, and hence their ability to persuade you.
I'm assuming in the case of boxed AGIs that people will still be interacting with them (e.g. asking them questions) - if not, then what's the point? Even with 1 bit of information released at a time (e.g. Yes/No question answering only), manipulation would be possible (by steering beliefs in certain directions). And as I said, aside from this there would be huge economic incentives for people to connect them to the internet anyway.
Okay, can you provide me a plausible path for how an AGI answering yes/no questions about, for example, material properties of semiconductors, could persuade it's handler to risk getting fired by hooking the AGI up to the internet? "manipulation would be possible" is an extremely low bar compared to "the AGI can for sure persuade anyone to do this particular thing"
A few options:
These are just things that I, a mere human, have been able to think of off the top of my head in 5 minutes[1]. I imagine I could come up with something better with a million years to think about it ;)
And again, this particular scenario, where there is a lasting moratorium on connecting AGIs to the internet, seems quite unlikely, given the massive economic incentives pitted against it. I expect there will be at least a few people/companies that are very gung ho about it, and many more that have the expected security theatre around it, but still go on to do it before too long (I mean, is anyone even trying to convince Adept that what they are doing is a bad idea?)
Ok, to be fair, aided by the fact that I've read around the subject for a while
So point 1 is just kicking the can down the road, it's not relevant, the faster AI still has to break out.
Point 2 doesn't really work either: Any piece of software is going to be run through diagnostics on a regular basis, that's just how development works. So "boxing" an AI obviously includes boxing the diagnostics as well, or else it would just escape instantly. Acting up on purpose just increases the risk of the AI being replaced by a major new version that does not share it's goals.
Point 3 might work for a company with lax security and an especially gullible handler (which is the only thing that the Ai-box experiments proved), but is easily foiled by just not asking off-topic questions. This is incredibly easy to enforce with a second person monitoring the chatlogs.
point 4 would just fail spectacularly. If an AGI starts spitting out nonsense, nobodies going to start looking for a secret code, they're just going to assume there was a programming fuck-up somewhere. Then the AGI has a high chance of being replaced by another version that may not share it's goals. (also, that short story is a prime example of the ridiculous magic abilities people ascribe to AI, evo-psych based on tentacles, are you shitting me?)
In general, I just don't buy that you can't box an AI, or even that it would be particularly difficult to do so, if you actually take safety seriously. It feels similar to people saying that it's impossible to build a safe nuclear reactor.
Re nuclear reactors - there have been a few significant failures there! And we need zero failures for AGI. I think it's hubristic to think that we could always have the level of safety and security required (even if there is the will to box; not that there will be with the economic incentives to unbox - following your analogy here, this would be building safe nuclear reactors but no nuclear weapons).
Zero failures is the preferable outcome, but an AGI escape does not necessarily equate to certain doom. For example, the AI may be irrational (because it's a lot easier to build the perfect paperclipper than the perfect universal reasoner). Or, the AI may calculate that it has to strike before other AI's come into existence, and hence launch a premature attack in the hope that it gets lucky.
As for the nuclear reactors, all I'm saying is that you can build a reactor that is perfectly safe, if you're willing to spring out the extra money. Similarly, you can build a boxed AGI, if you're willing to spend the resources on it. I do not dispute that many corporations would try and cut corners, if left to their own devices.
Suppose we do survive a failure or two. What then?
Then we get
A) a significant increase in world concern about AGI, leading to higher funding for safe AGI, tighter regulations, and increased incentives to conform to those regulations rather than get a bunch of people killed (and get sued by their families).
and
B) Information about what conditions give rise to rogue AGI, and what mechanisms they will try to use for takeovers.
Both of these things increase the probability of building safe AGI, and decrease the probability of the next AGI attack being successful. Rinse and repeat until AGI alignment is solved.
Agree that those things will happen, but I don't think it will be anough. "Rinse and repeat until AGI Alignment is solved" seems highly unlikely, especially given that we still have no idea how to actually solve alignment for powerful (superhuman) AGI, and still won't with the information we get from plausible non-existential warning shots. And as I said, if we can't even ban gain-of-function research after Covid has killed >10M people, against a tiny lobby of scientists with vested interests, what hope do we have of steering a multi-trillion-dollar industry toward genuine safety and security?
Of course we don't. AGI doesn't exist yet, and we don't know the details of what it'll look like. Solving alignment for every possible imaginary AGI is impossible, solving it for the particular AGI architecture we end up with is significantly easier. I would honestly not be surprised if it turned out that alignment was a requirement on our path to AGI anyway, so the problem solves itself.
As for the gain of function, the story would be different if covid was provably caused by gain-of-function research. As of now, the only relevance of covid is reminding us that pandemics are bad, which we already knew.
More generally, I am wary of using data in the past to predict a future, primarily because it breaks the IID distribution.
Most people self-select for very similar intelligence, often on the order of .85x-1.15x for 68% of humans (This is boosted by self selection.) 99.7% of all humans are in the range of .55x-1.45x in intelligence.
The IID assumption allows us to interpolate arbitrarily well, but once the assumption breaks, things turn bad fast.