Hide table of contents

Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.

Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.

Subscribe here to receive future versions.


The Trump Circle on AI Safety

The incoming Trump administration is likely to significantly alter the US government’s approach to AI safety. For example, Trump is likely to immediately repeal Biden’s Executive Order on AI.

However, some of Trump’s circle appear to take AI safety seriously. The most prominent AI safety advocate close to Trump is Elon Musk, who earlier this year supported SB 1047. However, he is not alone. Below, we’ve gathered some promising perspectives from other members of Trump’s circle and incoming administration.

UFC 309: Nickal v Craig
Trump and Musk at UFC 309. Photo Source.
  • Robert F. Kennedy Jr, Trump’s pick for Secretary of Health and Human Services, said in an interview with Forbes: “countries all over the world that are now developing very very high performance AI that is very frightening. We need to be talking to all of them and making sure that we develop treaties that protect humanity.”
  • Tulsi Gabbard, Trump’s pick to serve as Director of National Intelligence, said in an interview with Joe Rogan: "very real dangers of [AI] being weaponized… just like with nuclear arms race, there are no winners."
  • Representative Mike Waltz, Trump’s pick for National Security Advisor, said during an Atlantic Council seminar: “before we have full breakout [with AI] … get some regulatory structure around it… These breakout moments … could prove to be very dangerous.”
  • Vivek Ramaswamy, cohead of the new Department of Government Efficiency, said in an interview with KCCI: "if you're developing an AI algorithm today that has a negative impact on other people, you should bear the liability for it."

Others with potential influence in a Trump administration include Tucker Carlson, who in another Joe Rogan interview said: “Machines are helpmates, like we created them to help us, to make our lives better, not to take orders from them … If we agree that the outcome is bad and specifically it’s bad for people … then we should strangle it in its crib right now,” and Ivanka Trump, who tweeted about the risks of AGI.

Chinese Researchers Used Llama to Create a Military Tool for the PLA

Chinese military researchers have repurposed Meta's open-weight Llama model to create a military intelligence tool. According to Reuters, scientists from China's People's Liberation Army's Academy of Military Science used Meta's Llama 13B large language model to create an AI system called ChatBIT, designed to "gather and process intelligence, and offer accurate and reliable information for operational decision-making."

The researchers claim their system performs at approximately 90% of the capability of OpenAI's GPT-4, though they didn't specify how this was measured. The system was trained on 100,000 military dialogue records, which Reuters noted as "a relatively small number compared with other LLMs."

Meta responded to the development, stating that any military use of their models violates their acceptable use policy, which explicitly prohibits applications in warfare and defense. "Any use of our models by the People's Liberation Army is unauthorized and contrary to our acceptable use policy," Meta told Reuters. (Three days later, Meta made their models available for US defense applications.)

However, as the model is freely available, Meta has no ability to enforce any use restriction policies. It is prudent to assume that any models made open-weight will be used by US adversaries and malicious actors. This implies that as models begin to develop military-relevant capabilities, they should not be made open-weight.

A Google AI System Discovered a Zero-Day Cybersecurity Vulnerability 

Google used an AI system to discover a “zero-day” software vulnerability. Google’s "Big Sleep" system, developed jointly by Project Zero and DeepMind, uncovered a security flaw in SQLite. The vulnerability, an exploitable stack buffer underflow discovered in October, had evaded detection by traditional testing methods. SQLite's development team fixed the issue before it appeared in an official release.

Launched in June 2024 under the name Project Naptime, Big Sleep uses large language models to perform vulnerability research by mimicking human security researchers' workflows. The system combines several specialized tools—including a Code Browser, debugger, and Python sandbox—to analyze code and identify potential security flaws.

The discovery represents the first time an AI system has discovered a real-world “zero-day” security vulnerability, and suggests that AI systems can now surpass human capabilities in some cybersecurity tasks. This time, that capability was used defensively—but similar systems could be used offensively. 

Complex Systems

Our new book, Introduction to AI Safety, Ethics and Society, is available for free online and will be published by Taylor & Francis next year. This week, we look at Chapter 5: Complex Systems, which analyzes AI safety by viewing AI as a complex system and exploring the unexpected behaviors such systems often exhibit. The video lecture for the chapter is available here.

Complex systems. One approach to describing systems, reductionist paradigm, assumes that systems can be understood by breaking them down into their component parts. While this approach works well for mechanics and statistics, it fails to adequately describe complex systems. Complex systems consist of many independent parts that interact to produce effects greater than their sum. Since both AI systems and the societies deploying them are complex systems, the effects of AI are difficult to predict.

image
A Rube Goldberg machine with many parts that each feed directly into the next can be well explained by the mechanistic approach.

Emergence. A defining characteristic of complex systems is emergence—the appearance of striking system-wide features that cannot be found in any individual component. These emergent features often spontaneously activate as the system grows in size. For example, small language models trained to produce coherent sentences may suddenly develop translation abilities or master three-digit arithmetic when scaled.

Feedback loops. Complex systems frequently exhibit feedback loops, where interdependencies between system components reinforce or suppress certain processes. The phenomenon of "the rich getting richer" exemplifies such a loop: wealthy individuals have more money to invest, generating even greater returns on investment. Similarly, AI systems tasked with decision-making could accelerate developments beyond human response capabilities, requiring deployment of additional AI systems. These feedback loops make complex systems difficult to predict, as minor variations in initial conditions can amplify into dramatically different outcomes.

Lessons for AI safety. The nature of complex systems yields several crucial insights. First, experimentation becomes essential, as it provides the only reliable way to identify newly emergent capabilities. Second, a system proven safe at one scale may become dangerous when enlarged, as its emergent properties could pose unexpected risks. Finally, any system dependent on human oversight becomes unreliable if feedback loops accelerate processes beyond human control.

AI safety is a 'wicked' problem. Complex systems frequently generate 'wicked' problems—particularly challenging issues that typically involve social components. These problems share three key characteristics:

  1. They lack a single explanation or solution
  2. Attempted solutions carry inherent risks
  3. The resolution point may remain unclear

AI safety qualifies as a wicked problem because it has no unified solution, carries risks in implementing solutions, and may require perpetual vigilance rather than reaching a definitive "solved" state.

Links

Government

Industry

Other

See also: CAIS website, X account for CAIS, our $250K Safety benchmark competition, our new AI safety course, and our feedback form. The Center for AI Safety is also hiring for several positions.

Double your impact! Every dollar you donate to the Center for AI Safety will be matched 1:1 up to $2 million. Donate here.

Comments
No comments on this post yet.
Be the first to respond.
Curated and popular this week
Relevant opportunities