Preventing an AI-related catastrophe - Problem profile

Benjamin Hilton; 80000_Hours

This is a linkpost for https://80000hours.org/problem-profiles/artificial-intelligence

We (80,000 Hours) have just released our longest and most in-depth problem profile — on reducing existential risks from AI.

You can read the profile here.

The rest of this post gives some background on the profile, a summary and the table of contents.

Some background

Like much of our content, this profile is aimed at an audience that has probably spent some time on the 80,000 Hours website, but is otherwise unfamiliar with EA -- so it's pretty introductory. That said, we hope the profile will also be useful and clarifying for members of the EA community.

The profile primarily represents my (Benjamin Hilton's) views, though it was edited by Arden Koehler (our website director) and reviewed by Howie Lempel (our CEO), who both broadly agree with the takeaways.

I've tried to do a few things with this profile to make it as useful as possible for people new to the issue:

I focus on what I see as the biggest issue: risks of power-seeking AI from strategically aware planning systems with advanced capabilities, as set out by Joe Carlsmith.
I try to make things feel more concrete, and have released a whole separate article on what an AI-caused catastrophe could actually look like. (This owes a lot to Carlsmith's report, as well as Christiano's What failure looks like and Bostrom's Superintelligence.)
I give (again, what I see as) important background information, such as the results of surveys of ML experts on AI risk, an overview of recent advances in AI and scaling laws
I try to honestly explain the strongest reasons why the argument I present might be wrong
I include a long FAQ of common objections to working on AI risk to which I think there are strong responses

Also, there's a feedback form if you want to give feedback and prefer that to posting publicly.

This post includes the summary from the article and a table of contents.

Summary

We expect that there will be substantial progress in AI in the next few decades, potentially even to the point where machines come to outperform humans in many, if not all, tasks. This could have enormous benefits, helping to solve currently intractable global problems, but could also pose severe risks. These risks could arise accidentally (for example, if we don’t find technical solutions to concerns about the safety of AI systems), or deliberately (for example, if AI systems worsen geopolitical conflict). We think more work needs to be done to reduce these risks.

Some of these risks from advanced AI could be existential — meaning they could cause human extinction, or an equally permanent and severe disempowerment of humanity. ^[1] There have not yet been any satisfying answers to concerns — discussed below — about how this rapidly approaching, transformative technology can be safely developed and integrated into our society. Finding answers to these concerns is very neglected, and may well be tractable. We estimate that there are around 300 people worldwide working directly on this.^[2] As a result, the possibility of AI-related catastrophe may be the world’s most pressing problem — and the best thing to work on for those who are well-placed to contribute.

Promising options for working on this problem include technical research on how to create safe AI systems, strategy research into the particular risks AI might pose, and policy research into ways in which companies and governments could mitigate these risks. If worthwhile policies are developed, we’ll need people to put them in place and implement them. There are also many opportunities to have a big impact in a variety of complementary roles, such as operations management, journalism, earning to give, and more — some of which we list below.

Our overall view

Recommended - highest priority

This is among the most pressing problems to work on.

Scale

AI will have a variety of impacts and has the potential to do a huge amount of good. But we’re particularly concerned with the possibility of extremely bad outcomes, especially an existential catastrophe. We’re very uncertain, but based on estimates from others using a variety of methods, our overall guess is that the risk of an existential catastrophe caused by artificial intelligence within the next 100 years is around 10%. This figure could significantly change with more research — some experts think it’s as low as 0.5% or much higher than 50%, and we’re open to either being right. Overall, our current take is that AI development poses a bigger threat to humanity’s long-term flourishing than any other issue we know of.

Neglectedness

Around $50 million was spent on reducing the worst risks from AI in 2020 – billions were spent advancing AI capabilities.^[3] ^[4] While we are seeing increasing concern from AI experts, there are still only around 300 people working directly on reducing the chances of an AI-related existential catastrophe.^[2] Of these, it seems like about two-thirds are working on technical AI safety research, with the rest split between strategy (and policy) research and advocacy.

Solvability

Making progress on preventing an AI-related catastrophe seems hard, but there are a lot of avenues for more research and the field is very young. So we think it’s moderately tractable, though we’re highly uncertain — again, assessments of the tractability of making AI safe vary enormously.

Full table of contents

Acknowledgements

Huge thanks to Joel Becker, Tamay Besiroglu, Jungwon Byun, Joseph Carlsmith, Jesse Clifton, Emery Cooper, Ajeya Cotra, Andrew Critch, Anthony DiGiovanni, Noemi Dreksler, Ben Edelman, Lukas Finnveden, Emily Frizell, Ben Garfinkel, Katja Grace, Lewis Hammond, Jacob Hilton, Samuel Hilton, Michelle Hutchinson, Caroline Jeanmaire, Kuhan Jeyapragasan, Arden Koehler, Daniel Kokotajlo, Victoria Krakovna, Alex Lawsen, Howie Lempel, Eli Lifland, Katy Moore, Luke Muehlhauser, Neel Nanda, Linh Chi Nguyen, Luisa Rodriguez, Caspar Oesterheld, Ethan Perez, Charlie Rogers-Smith, Jack Ryan, Rohin Shah, Buck Shlegeris, Marlene Staib, Andreas Stuhlmüller, Luke Stebbing, Nate Thomas, Benjamin Todd, Stefan Torges, Michael Townsend, Chris van Merwijk, Hjalmar Wijk, and Mark Xu for either reviewing the article or their extremely thoughtful and helpful comments and conversations. (This isn’t to say that they would all agree with everything I said – in fact we’ve had many spirited disagreements in the comments on the article!)

This work is licensed under a Creative Commons Attribution 4.0 International License.

^{^}
We’re also concerned about the possibility that AI systems could deserve moral consideration for their own sake — for example, because they are sentient. We’re not going to discuss this possibility in this article; we instead cover artificial sentience in a separate article here.
^{^}
I estimated this using the AI Watch database. For each organisation, I estimated the proportion of listed employees working directly on reducing existential risks from AI. There’s a lot of subjective judgement in the estimate (e.g. “does it seem like this research agenda is about AI safety in particular?”), and it could be too low if AI Watch is missing data on some organisations, or too high if the data counts people more than once or includes people who no longer work in the area. My 90% confidence interval would range from around 100 people to around 1,500 people.
^{^}
It’s difficult to say exactly how much is being spent to advance AI capabilities. This is partly because of a lack of available data, and partly because of questions like:
* What research in AI is actually advancing the sorts of dangerous capabilities that might be increasing potential existential risk?
* Do advances in AI hardware or advances in data collection count?
* How about broader improvements to research processes in general, or things that might increase investment in the future through producing economic growth?
The most relevant figure we could find was the expenses of DeepMind from 2020, which were around £1 billion, according to their annual report. We’d expect most of that to be contributing to “advancing AI capabilities” in some sense, since their main goal is building powerful, general AI systems. (Although it’s important to note that DeepMind is also contributing to work in AI safety, which may be reducing existential risk.)
If DeepMind is around about 10% of the spending on advancing AI capabilities, this gives us a figure of around £10 billion. (Given that there are many AI companies in the US, and a large effort to produce advanced AI in China, we think 10% could be a good overall guess.)
As an upper bound, the total revenues of the AI sector in 2021 were around $340 billion.
So overall, we think the amount being spent to advance AI capabilities is between $1 billion and $340 billion per year. Even assuming a figure as low as $1 billion, this would still be around 100 times the amount spent on reducing risks from AI.

138 Reactions

The longtermist AI governance landscape: a basic overview

13 comments168 karma

Why s-risks are the worst existential risks, and how to prevent them

1 comments10 karma

Mentioned in

42EA & LW Forums Weekly Summary (28 Aug - 3 Sep 22’)

15 Monthly Overload of EA - September 2022

2AI関連の破局を防ぐ［分析結果］

1Prevenire una catastrofe legata all'intelligenza artificiale

Comments18

Sorted by

New & upvoted

Click to highlight new comments since: Today at 10:09 AM

jackva3y20

How is it that after this being on top of the EA agenda for the better part of the last decade we still have only 300 people working on this?

Benjamin Hilton2y18

Yeah, it’s a good question! Some thoughts:

I’m being quite strict with my definitions. I’m only counting people working directly on AI safety. So, for example, I wouldn’t count the time I spent writing this profile on AI (or anyone else who works at 80k for that matter). (Note: I do think lots of relevant work is done by people who don’t directly work on it) I’m also not counting people who think of themselves as on an AI safety career path and are, at the moment, skilling up rather than working directly on the problem. There are some ambiguities, e.g. are the ops team of an AI org working on safety? In general though these ambiguities seem much lower than the error in the data itself.
AI safety is hugely neglected outside EA (which is a key reason why it seems so useful to work on). This isn't a big surprise and may be in large part a result of the fact that it used to be even more neglected, which means that anything that is started as an AI safety org is likely to have been started by EAs, so is also seen as an EA org. Which makes AI safety a subset of EA rather than the other way round.
Also, I'm looking at AI existential safety rather than broader AI ethics or AI safety issues. The focus on x-risk (combined with reasons to think that lots of work on AI non-existential safety isn't that relevant - as compared with e.g. bio where lots of policy work for example is relevant to major pandemics and existential pandemics) makes it even more likely that this is just looking at a strict subset of EAs
There are I think up to around 10 thousand engaged EAs - of those maybe 1-2 thousand are longtermism or x-risk focused. So we're looking at 10% of these people working full-time on AI x-risk! Seems like a pretty high proportion to me given the various causes in the wider EA (not even longtermist) community.
So in many ways the question of "how are so few people working on AI safety after 10 years" is similar to "how are there so few EAs after 10 years", which is a pretty complicated question. But it seems to me like EA is way way way bigger and more influential than I would ever have expected in 2012!
There are also some other bottlenecks (notably mentoring capacity). The field was nearly non-existent 10 years ago, with very few senior people to help others enter the field – and it’s (rightly) a very technical field, focused on theoretical and practical computer science / ML. Even now, the proportion of time those 300 people should be spending mentoring is very much unclear to me.

I'd also like to highlight the footnote alongside this number: "There’s a lot of subjective judgement in the estimate (e.g. “does it seem like this research agenda is about AI safety in particular?”), and it could be too low if AI Watch is missing data on some organisations, or too high if the data counts people more than once or includes people who no longer work in the area. My 90% confidence interval would range from around 100 people to around 1,500 people."

Lin BL3y1

Commenting as I'd also like to see a response to this. I guess it depends how they define 'working directly', perhaps emphasizing certain orgs? I am not focussed on AI myself, but I have spoken to loads of EAs who have an AI focus (even if nobody is doing this outside of EA) that this number seems surprisingly low. Not to say it isn't neglected!

weeatquince2y19

Amazing post. Really good clear write-up for the lay reader new to AI. I feel confident to share this.

One point where I worry that readers could take away the wrong impression is with the line that "we’re not yet at the point of knowing what policies would be useful to implement".

I agree with you that "we are in the early stages of figuring out the shape of this problem [AI governance] and the most effective ways to tackle it" but I worry saying we don’t yet know what policies to advocate for (a fairly common trope among non-policy AI people in EA) gives a number of misleading impressions. It implies that AI policy advocacy work has no value at present, that people working on AI policy don’t know what they are doing and shouldn't currently be working in this area. I think this is wrong. I think governments are putting AI polices in place now and if we refuse to engage then we risk missing opportunities to make things better and there are clear cases where we know what better policy and worse policy looks like.

Lets take one example directly from your own post. You article says "If we could successfully ‘sandbox’ an advanced AI — that is, contain it to a training environment with no access to the real world until we were very confident it wouldn’t do harm — that would help our efforts to mitigate AI risks tremendously." That is a policy! Right now the US government is producing non-binding guidance for AI companies on how to manage the risks from AI. I am involved on some ongoing work to encourage this guidance to say that AI systems that A] can self-improve and B] present risks if they go wrong, should be sandboxed and tested. I don’t at all think it is your intention to imply that EA should miss a policy opportunity to get AI companies to consider sandboxing (a thing you strongly agree with). But I worry that some non-policy people I talk to in the EA community seem to have views that approximate this level of dismissal for all current AI policy advocacy work (e.g. see views of funders here and here).

Note on other AI policies. I suggest a few things to focus on at point 3 here. There is the x-risk database of 250+ policy proposals here. There is work on policy ideas in Future Proof here. Etc.

Sjlver3y6

Thanks a lot for this profile!

It leaves me with a question: what is the possibility that the work outlined in the article makes things worse rather than better? These concerns are fleshed out in more details in this question and its comment threads, but the TL;DR is:

AI safety work is difficult: there are lots of hypotheses, experiments are hard to design, we can't do RCTs to measure whether it works, etc. Thus, there is uncertainty even about the sign of the impact.
AI safety work could plausibly speed up AI development, create information hazards, be used for greenwashing regular AI companies... thereby increasing rather than decreasing AI risk.

I'd love to see a discussion of this concern, for example in the form of an entry under "Arguments against working on AI risk to which we think there are strong responses", or some content about how to make sure that the work is actually beneficial.

Final note: I hope this didn't sound too adversarial. My question is not meant as a critique of the article, but rather a genuine question that makes me hesitant to switch to AI safety work.

Arden Koehler3y13

(Responding on Benjamin's behalf, as he's away right now):

Agree that it's hard to know what works in AI safety + it's easy to do things that make things worse rather than better. My personal view is that we should expect the field of AI safety to be overall good because people trying to optimise for a thing will overall move things in its direction in expectation even if they sometimes move away from it by mistake. It seems unlikely that the best thing to do is nothing, given that AI capabilities are racing forward regardless.

I do think that the difficulty of telling what will work is a strike against pursuing a career in this area, because it makes the problem less tractable, but it doesn't seem decisive to me.

Agree that a section on this could be good!

Sjlver3y2

I appreciate the response, and I think I agree with your personal view, at least partially. "AI capabilities are racing forward regardless" is a strong argument, and it would mean that AI safety's contribution to AI progress would be small, in relative terms.

That said, it seems that the AI safety field might be particularly prone to work that's risky or neutral, for example:

Interpretability research: interpretability is a quasi-requirement for deploying powerful models. Research in this direction is likely to produce tools that increase confidence in AI models and lead to more of them being deployed, earlier.
Robustness research: Similar to interpretability, robustness is a very useful property of all AI models. It makes them more applicable and will likely increase use of AI.
AI forecasting: Probably neutral, maybe negative since it creates buzz about AI and increases investments.

It's puzzling that there is much concern about AI risk, and yet little awareness of the dual-use nature of all AI research. I would appreciate a stronger discussion about how we can make AI actually more safe, as opposed to more interpretable, more robust, etc.

Benjamin Hilton2y7

I think these are all great points! We should definitely worry about negative effects of work intended to do good.

That said here are two other places where maybe we have differing intuitions:

You seem much more confident than I am that work on AI that is unrelated to AI safety is in fact negative in sign.
It seems hard to conclude that the counterfactual where any one or more of "no work on AI safety / no interpretability work / no robustness work / no forecasting work" were true is in fact a world with less x-risk from AI overall. That is, while I can see there are potential negative effects of these things, when I truly try to imagine the counterfactual, the overall impact seems likely positive to me.

Of course, intuitions like these are much less concrete than actually trying to evaluate the claims , and I agree it seems extremely important for people evaluating or doing anything in AI safety to ensure they're doing positive work overall.

Sjlver2y2

Thanks for pointing out these two places!

You seem much more confident than I am that work on AI that is unrelated to AI safety is in fact negative in sign.

Work on AI drives AI risk. This is not equally true of all AI work, but the overall correlation is clear. There are good arguments that AI will not be aligned by default, and that current methods can produce bad outcomes if naively scaled up. These are cited in your problem profile. With that in mind, I would not say that I'm confident that AI work is net-negative... but the risk of negative outcomes is too large to feel comfortable.

It seems hard to conclude that the counterfactual where any one or more of "no work on AI safety / no interpretability work / no robustness work / no forecasting work" were true is in fact a world with less x-risk from AI overall.

A world with more interpretability / robustness work is a world where powerful AI arrives faster (maybe good, maybe bad, certainly risky). I am echoing section 2 of the problem profile, which argues that the sheer speed of AI advances is cause for concern. Moreover, because interpretability and robustness work advances AI, traditional AI companies are likely to pursue such work even without an 80000hours problem profile. This could be an opportunity for 80000hours to direct people to work that is even more central to safety.

As you say, these are currently just intuitions, not concretely evaluated claims. It's completely OK if you don't put much weight on them. Nevertheless, I think these are real concerns shared by others (e.g. Alexander Berger, Michael Nielsen, Kerry Vaughan), and I would appreciate a brief discussion, FAQ entry, or similar in the problem profile.

And now I'll stop bothering you :) Thanks for having written the problem profile. It's really nice work overall.

Alix W5mo1

Thank you for this great overview! I might have missed it but is there a link to work being done/needed to be done on how to help people to adapt/reskill to upcoming AI development. Similar to the reskilling need linked to the need for greener jobs. I can imagine that a real focus and opportunity lies here. To ensure that people who will see their current jobs/field being widely impacts by AI have the guidance and support to move towards a career that increases their sense of purpose and contribution vs lead them to a sense of loss of meaning and/or exclusion.

Thank you! Alix.

weeatquince3y4

Heads up that some of your links (e.g. those in "full table of contents") go to a page that reads: "Sorry, you are not allowed to preview drafts."

Amazing to see this out. Really excited to read it!!! :-)

Benjamin Hilton3y6

Ah thanks :) Fixed.

Jamie_Harris2y2

I'm a fan of the profile, especially the section on " What do we think are the best arguments we’re wrong?". I thought this was well done and clearly explained.

One important category that I don't remember seeing is on wider arguments against existential risk being a priority. E.g. in my experience with 16-18 year olds in the UK, a very common response to Will MacAskill's Ted talk (that they saw in the application process) was disagreement that the future was actually on track to be positive (and hence worth saving).

More anecdotally, something that I've experienced in numerous conversations, with these people and others, is that they don't expect/believe they could be motivated to work on this problem. (e.g. due to it feeling more abstract, less visceral than other plausible priorities.)

Maybe you didn't cover these because they're relevant to much work on x-risks, rather than AI safety specifically?

Vasco Grilo🔸2y2

(It’s also possible that it wouldn’t even be a good idea if we could [prevent the development of transformative AI] — after all, that would mean forgoing the benefits as well as preventing the risks.)

I think this is a good point. The goal is maximising the expected value of the future, not minimising the probability of the worst outcome.

[anonymous]3y2

Pardon me if this is an obvious reference around here, but what is the source for the "much higher than 50%" risk? My prior is that such percentages are too high to be taken seriously as a rational prediction, but precisely for that reason I'd be interested in challenging and updating.

Thomas Kwa3y11

Last I heard Nate Soares (at MIRI) has an all-things-considered probability around 80%, and Evan Hubinger recently gave ~80% too. Nate's reasoning is here, and he would probably also endorse this list of challenges.

I think you don't really have to have any crazy beliefs to have probabilities above 50%, just

higher confidence in the core arguments being correct, such that you think there are concrete problems that probably need to be solved to avoid AI takeover
a prior that is not overwhelmingly low, despite some previous mechanisms for catastrophe like overpopulation and nuclear war being avoidable. The world is allowed to kill you.
observation that not much progress has been made on the problem so far, and belief that this will not massively speed up as we get closer to AGI

Believing there are multiple independent core problems we don't have traction on, or that some problems are likely to take serial time or multiple attempts that we don't have, can drive this probability higher.

Yonatan Cale3y5

Adding Nate Soares's "AGI ruin scenarios are likely [...]"

Erich_Grunewald 🔸3y2

See e.g. Yudkowsky's AGI Ruin: A List of Lethalities. I think at this point Yudkowsky is far from alone in giving it >50% probability, though I expect that view is far less common in academia and among machine learning (capabilities) researchers.