A survey of concrete risks derived from Artificial Intelligence

Guillem Bas; Roberto Tinoco; Jaime Sevilla; Mónica Ulloa; JorgeTorresC

This is a linkpost for https://riesgoscatastroficosglobales.com/articulos/encuesta-de-riesgos-concretos-derivados-de-la-inteligencia-artificial

Riesgos Catastróficos Globales has conducted a literature review and an expert elicitation exercise^[1] to categorize concrete risks associated with Artificial Intelligence (AI). This is part of our ongoing work on the implementation of the EU AI Act in Spain.

Here we present a short overview of the risks we have found. This is meant to be a mental framework for policymakers to consider when developing AI policies, but we think it might also be useful to incentivize discussion within the community. Please feel free to leave your thoughts as comments!

To facilitate comprehension, we have split the identified risks into two categories: adversarial and structural risks. Adversarial risks are those caused by the direct action of an agent, be it rogue groups, state actors, or misaligned AI. Structural risks are those derived from the wide-scale or high-impact deployment of AI, with diffuse causes.

The distinction builds upon the categorization between accidents, misuse, and structural risks (Zwetsloot & Dafoe, 2019). We preferred to merge the first two because we considered there was not always a clear difference in how accidents (AI misalignment) and misuses (humans exploiting an AI system to cause harm) materialize as specific threats^[2].

As for this materialization, we outline risks integrating present and future implications. That is to say, we state that their long-term impact is potentially large, but we ground them on existing and modest evidence. This choice is based on the assumption that policymakers will tend to underestimate speculative framings. The underlying logic we try to convey is that damage will increase along with capabilities and deployment.

We have identified nine concrete risks within these categories, which are summarized in the table below. The categorization is not perfect, but we tried to prioritize clarity and concreteness over accuracy and exhaustiveness^[3].

Risk category	Risk	Example vignette
Adversarial risks: directly caused by agents, either humans or misaligned AI	Cyberattacks and other unauthorized access	LLM-enabled spear-phishing campaigns
	Strategic technology development	Development of a new biological weapon
	User manipulation	Individuals persuaded to support a certain political option
Structural risks: caused by widespread automation	Job market disruption	10% increase in unemployment over a year
	Socioeconomic inequality	Leading companies capturing AI-created surpluses
	Bias amplification	Minority groups being systematically denied access to housing or loans
	Epistemic insecurity	Proliferation of deep fakes
	Faulty automation of critical processes	Accidental nuclear attack from fully-automated C&C
	Defective optimization	Hospitals rejecting patients with serious conditions to maximize performance metrics

We briefly introduce these risks below, together with references for further reading.

Adversarial risks

This section compiles potential threats from rogue human actors and misaligned AI. The final list coincides with what Shevlane et al. (2023) call "extreme risks" and is slightly connected to the distinction between digital, physical, and political dimensions proposed by Brundage et al. (2018).

Readers might note that our selected risks are commonly mentioned as instances of power-seeking behavior. We have not included vignettes about goal mispecification and misgeneralization for two reasons: they tended to be too vague and, to be impactful, most of them required the instrumental use of the actions listed below.

Cyberattacks and other unauthorized access

AI promises to enhance the execution of cyber offenses, increasing their scale and impact (Brundage, et al., 2018). New tools can automate manual tasks (see Mechanical Phish for vulnerability detection), improve current techniques (see GPT-4 for spear phishing campaigns), and add new capabilities (see DeepDGA for evasion) (Aksela et al., 2021). Likewise, the AI systems themselves harbor specific vulnerabilities that can be exploited by adversaries to alter their behavior. Some examples include data poisoning (Schwarzschild et al., 2021) or prompt injection (Perez & Ribeiro, 2022).

Both humans and AI systems could try to accumulate power through various cybernetic operations. Examples of this include looting financial resources, accessing C&C, obtaining sensitive data, and self-replicating. In some cases, AI systems could even infiltrate other devices offline, by finding unexpected ways of interacting with the environment – see Bird & Layzell (2002) for an example.

Strategic technological development

AI-powered weapons introduce an accountability gap (Sparrow, 2007) and are prone to making incorrect decisions, mainly due to out-of-distribution errors (Longpre et al., 2022). Adversaries trying to manipulate performance could aggravate this problem (Eykholt et al., 2018). On the other hand, AI lowers the barriers to entry for inflicting large-scale damage, favoring rogue actors (Kreps, 2021). This includes LAWS, but also facilitating the development of biological weapons (Urbina et al., 2022), among others.

Besides offensive uses, AI could also confer decisive strategic advantages by enabling scientific innovations with high practical impact. Monopolizing such an innovation could ensure undisputed hegemony for an actor, dangerously altering the balance of power. Take as an example nuclear fusion, where deep learning has already contributed to efforts to stabilize and control the plasma (Degrave et al., 2022), calculate its electric field (Aguilar & Markidis, 2021), and predict disruptions (Kates-Harbeck et al., 2019).

User manipulation

AI systems excel at profiling users and persuasion techniques. Relatively simple algorithms are already able to exploit human heuristics and influence individual preferences (Agudo & Matute, 2021), decision-making (Dezfouli et al., 2020), and public opinion (Schippers, 2020).

APS systems could manipulate users through more sophisticated techniques, including emotional manipulation or extortion. For example, GPT-4 was already able to convince a contractor to solve a CAPTCHA (OpenAI, 2023).

Structural risks

This section compiles risks derived from the wide-scale or high-impact deployment of AI systems. The consequences of wide deployment are diverse, ranging from economic effects to fairness and political stability. High-impact deployment refers to the consequence of relinquishing human decision-making power in critical processes. This includes considerations on lack of supervision, increased speed, and bias towards quantifiable targets.

Job market disruption

The recent emergence of generative AI has the potential to rapidly accelerate task automation, leading to significant disruption in the labor market. Hatzius et al. (2023) suggest that up to one-fourth of current jobs could be replaced by current generative AI, with approximately two-thirds of jobs being exposed to some degree of automation. This could result in the automation of nearly 300 million jobs worldwide, affecting different countries to varying extents. To illustrate the risks more explicitly, Eloundou et al. (2023) have analyzed the impact of large language models and have concluded that 19% of jobs in the U.S. have at least 50% of their tasks exposed to automation.

In addition to this, Jacobsen et. al (2005) have found that it takes between 1 and 4 years to reemploy displaced workers, who are anyway unlikely to find jobs similar to their previous ones. Therefore, these people experience long-term income losses.

Socioeconomic inequality

AI-generated value could be captured by AI companies or the countries where they are located, exacerbating wealth inequality (O’Keefe et al., 2020). The nature of AI facilitates unfair competition and economic concentration, with data being one of the main drivers of this trend (Acemoglu, 2021). In a feedback loop, data improves quality and quality attracts users, which increment data (Anderson, 2021). High barriers to entry hinder competition.

Beyond moral objections, the sociopolitical consequences of inequality could unstabilize the world, e.g., increasing the risk of riots and crimes. Besides, controlling such a differential technology would grant its owners an excessive privilege, namely, the possibility of unilaterally making political decisions of great importance to the rest of society.

Bias amplification

AI systems adopt and reproduce bias found in the training datasets, which usually translates into disparate performance depending on their familiarity with the topic. This is particularly concerning when it comes to critical decisions in healthcare, justice, or finance (Bommasani, 2022).

For instance, some algorithms have been detrimental for black people in recidivism prediction (Dressel & Farid, 2018) and for disadvantaged groups in credit approvals (Blattner & Nelson, 2021).

Epistemic insecurity

Access to reliable information is key for a healthy society to make based decisions, thus the proliferation of misinformation poses a threat to national security (Seger et al., 2020).

AI exacerbates the risk in different ways. First, large language models (LLMs) are prone to accidental “hallucination” (Ji et al., 2023). Second, LLMs could be exploited by malicious actors to carry out influence operations (Goldstein et al., 2023), while image and video generation models are useful for deep fake creation (Nguyen et al., 2022). Finally, AI-created content could contribute to information overload, which undermines individuals’ abilities to discern relevant and accurate information.

Faulty automation of critical processes

Automatizing critical decision-making and management processes could cause serious incidents if the involved AI systems are prone to errors or not designed to respect human values and goals. Accidental errors are especially worrying because most AI systems are not sufficiently robust to distributional shifts (Amodei et al., 2016). If these processes are carried out without human supervision, their speed could make incidents easily spiral out of control (Scharre, 2018).

An extreme example is the automation of nuclear command and control. Transferring decision-making power would increase the probability of catastrophic errors in the interpretation of information. See the incident in the Soviet Union in 1983, when a radar set off alarms after confusing sunlight with an intercontinental ballistic missile. In that case, the presence of a human supervisor who decided to wait for more evidence prevented the launch of a Soviet attack. For more on how machine learning affects nuclear risk, see Avin & Amadae (2019).

Defective optimization

Since AI systems often seek to optimize a function, they tend to favor operationalizable and quantifiable objectives, thus neglecting other important values (Christiano, 2019). That is an example of Goodhart’s law, which states that “when a measure becomes a target, it ceases to be a good measure” (Manheim & Garrabrant, 2019). For instance, GDP is a useful variable for certain purposes, but focusing excessively on it implies ignoring factors such as subjective satisfaction, inequality rates, or environmental impact (Pilling, 2018).

As AI permeates more decision-making areas, there could be an increasing gap between the result of optimization and the nuanced objective we would ideally want to achieve. See the case of content selection algorithms for a hint. Maximizing engagement seemed to be a good proxy for economic profit and even social discussion, but turned out to favor incendiary content (Munn, 2020), animosity (Rathje et al., 2021), and outrage (Brady et al., 2021). This toxicity deteriorates the public debate and might even end up affecting the platforms themselves by incrementing social media fatigue (Zheng and Ling, 2021).

Conclusion

In this post, we proposed an AI risk landscape based on two categories: adversarial risks and structural risks. The former are mostly ways in which a misaligned power-seeking agent could acquire human, financial, and technological resources. The latter usually imply collateral damages and pressures on the environment provoked by wide and high-consequence deployment. All of them are present to some extent at the current level of development, and could exacerbate as AI capabilities or adoption increase.

Acknowledgments

We thank José Hernández-Orallo, Pablo Moreno, Max Räuker, and Javier Prieto for participating in our elicitation exercise. We also thank Rose Hadshar for her valuable feedback on this post. All remaining errors are our own.

^{^}
The elicitation exercise was run by the RCG staff and fed with the contributions of four external experts (see Acknowledgements). In the document, we asked participants to “surface concrete risks derived from artificial intelligence, both present and future, that [they] would like to highlight to EU policymakers as risks to have present when designing regulation and auditing processes”.
^{^}
One could go even further and argue that no human actor would be able to misuse an AI system if the latter is not prone to allowing that misuse.
^{^}
Some other risks that we encountered but we considered too speculative or limited in scope include “cognitive atrophy derived from user dependency of general-purpose AI”, “disruption of grading and placement systems”, and “resource exhaustion and emissions from training and deployment”.

Benevolent_RainJun 18 20231

Thanks, this work is likely to inform my own work significantly! One question: Have you considered ranking/rating the various categories for risk based on Metaculus predictions? One thing that interests me a lot are the Metaculus questions of the form "Ragnarök Question Series: If a global catastrophe occurs, will it be due to X?" These questions currently sum to 150% which makes me think there is overlap between especially AI and nuclear, as well as AI and bio - both categories you have identified. Would you have any idea of how to infer the risks of bio or nuclear from the Metaculus questions? I think it might perhaps also strengthen your work, as you can rank/rate the different categories at least in terms of causing large scale loss of life.

Jaime SevillaJun 19 20234

Ranking the risks is outside the scope of our work. Interpreting the metaculus questions sounds interesting, though it is not obvious how to disentangle the scenarios that forecasters had in mind. I think the Forecasting Research Institute is doing some related work, surveying forecasters on different risks.

Effective Altruism Forum
EA Forum

A survey of concrete risks derived from Artificial Intelligence

36

Adversarial risks

Structural risks

Conclusion

36

Reactions

More posts like this