Let’s set new AI safety actors up for success

michel

Summary:

As AI safety enters the mainstream, many new and powerful actors seem to be taking interest in the field
Some of these actors seem positioned to have an outsized influence on AI risk (e.g., governments that can pass strict regulations or funds lots of research; journalists that can shape public discourse)
I claim that some of last year's AI alignment community should, on the margin, prioritize actively forming relationships with these new actors and setting them up to do productive, x-risk reducing work. I don’t think enough of this will happen by default.
It seems better to try to guide new actors' efforts sooner than later since 1) the EA-adjacent, x-risk motivated AIS community may be at its peak influence (and might lose this quickly) and 2) new actors' efforts are probably path dependent so it’s important they start out on the right track.
Concrete ways to do help guide new actors to make productive contributions to AI safety include not putting off new actors, forming constructive relationships, distilling existing research, trying to shape the research paradigm, boosting the right memes, and genuinely trying to understand the perspective of newcomers.

New actors are taking AI risk seriously (thankfully!)

Tl;dr: There’s a lot of new powerful actors taking an interest in AI safety. I’d guess this is net positive, but as a result EA is losing its relative influence on AI safety.

A year ago, if you wanted to reduce the risk of AI takeover, you probably engaged with EA and adjacent communities: you probably read LessWrong, or the EA forum, or the Alignment forum; you probably attended EAG(x)s; you probably talked about your ideas for reducing AI risk with with people who identified as EAs. Or at the very least you knew of them.

That’s changed. Over the past 3 months, powerful actors have taken an interest in catastrophic risks from AI, including existential catastrophes:

AI risk made headlines all over the world following the FLI letter and CAIS letter
Prominent academics like Yoshua Bengio and Geoffrey Hinton have publicly staked reputation on AI posing an existential risk
The UK government announced they’re hosting a “major global Summit on AI safety”
The UN Secretary General backed AI as an existential threat

EA and adjacent AI safety-focused communities – what I’ll refer to as “last year’s AI alignment community” from now on – are no longer a sole authority on AI risk. If you want to reduce large-scale risks from AI a year from now, it seems increasingly likely you’ll no longer have to turn to last year’s AI alignment community for answers, and can instead look to some other, more ‘traditional’ actors.

What exactly new actors like governments, journalists, think tanks, academics, and other talent pools do within AI safety is still to be determined – and EA-motivated people who have thought a lot about alignment are uniquely positioned to guide new actors efforts to productively reduce x-risk.

As I’ll expand on below, I think now is an opportune time to shape the non-EA-adjacent portion of work done on AI safety.

These new actors may have a big influence on AI risk

The new actors who seem to be taking AI safety seriously are powerful and often have credibility, status, or generic power that most actors active in the EA sphere don’t have (yet).

The “new actors” I’m imagining include academics, journalists, industry talent, governments, think tanks, etc. that previously weren’t interested in AI safety but now are. These actors can do things EA actors currently can’t, like:

Regulation and Legislation: The US government can pass a law requiring AI companies to submit their machine learning models for third-party audits to ensure ethical behavior and safety.
Research Funding: The UK government can invest in a new research institute dedicated to exploring AI alignment on foundational models.
Public Discourse Shaping: New “mainstream” journalists, academics, and representatives of government orgs can solidify public discourse on AI risk/ keep it from being just a fad by openly and continuously discussing AI risk.
Diplomacy and International Cooperation: International orgs like the UN can spearhead global treaties, where signatories e.g., commit to some form of chip monitoring.
Diverse talent: New academics and industry players who are taking AI safety seriously could make intellectual progress on alignment by e.g., changing their labs research agenda to work on prosaic alignment.
Incentivizing Private Sector: Governments can shift alignment incentives by punishing companies that don’t prioritize safe and ethical AI practices (or rewarding companies that do).

What should we do in light of the new actors taking an interest in AI safety?

Last year’s alignment community could respond to these actors taking an interest in AI safety in a variety of ways, which I see as roughly a continuum:

Gatekeeping: Pride and a sense of ‘told you so’ make some existing alignment community actors respond defensively to requests for collaboration or bids from ‘outsiders’ to learn more.
Business as usual: The alignment community keeps on like nothing happened. The new actors don’t see most of last year's alignment community as credible authorities (opting instead for people like Geoffrey Hinton), and most don’t try to form new connections. A schism forms.
Greeting new actors openly and guiding their efforts to productively focus on existential risks: Some actors in the existing AI alignment community sacrifice object level work for things like relationship building, idea distillation, and scoring public credibility points. ~All existing actors do a bare minimum of the concrete actions listed below.

On the margin, I think we should be more actively forming relationships with new actors and setting them up to do x-risk reducing work, even if it's at the expense of some of our direct work.

This is primarily driven by the expected impact I think these new actors could have on AI safety in the coming years (elaborated on above), such that even small changes to what they work on and prioritize could make a big difference.
There’s also the basic intuition that more people with new expertise working on a hard problem just seems better.
Now (or just sooner than later) might be the opportune time to engage with new actors
- Last years AI safety community may be at its peak influence (and might lose this quickly) as more legible voices of legible authority enter AI risk discussions
- New actors' efforts are probably path dependent so influencing new actors initial trajectory seems particularly important (e.g., influencing a new academic labs initial research agenda will likely have a greater effect than spending the same effort trying to reorient it years later).
By default, I expect many in last year’s alignment community will have some pulls against coordinating with new actors:
- Status quo bias: People are reluctant to change.
- EA & rationality’s sense of exceptionalism: We sometimes think that we can do it best; that nobody has done it like us and first principles will take us to glory. I think this is sometimes true, but often naive.
- Disillusionment with existing power structures: Imagine you just spent years of your life working on a problem people scoffed at as sci-fi. And now those people want your help? I think it’s reasonable to feel dismissive
- Communication Barriers: Last year’s alignment community has a lot of its own terminology and communication norms – it’s costly to lose these in conversations with newcomers and have to explain many building blocks.

When I thought about ways EA could fail a year ago, I became most worried about the type of failures where the world just moves on without us – the worlds where we’re not in the rooms where it happens (despite writing lots of good forum posts and tweets). I’m worried that’s what could happen soon with AI risk. By not actively trying to position ourselves in existing power structures, we give up seats in the room where it happens to people who I think are less well-intentioned and clear eyed about what’s at stake.

I may well be wrong about this ‘more coordination with new actors is possible and worth investing in’ take:

As a general disclaimer, I’m not super tuned into the alignment community.
- I’m confident I’ll have gotten things wrong in this post
- I’m likely missing important context on: the degree to which people are already coordinating with new actors, how receptive the new actors are to coordination, and a host of unknown unknowns.
The AI risk trend may not last, or it may be more smoke than fire.
- This seems to be like more than a phase, given the sheer degree of interest and backing by prominent people with something to lose that AI x-risk has gotten – or at least it could be turned into more than a phase.
There may be downsides to coordination I’m not tracking.
- Large-scale polarization around AI as an x-risk still seems like a big worry, and maybe coordination among certain people feeds into that in ways I’m not predicting.
AI alignment resources and influence may be more zero sum than I’m imagining.
- When I think of a new academic expressing interest in AI alignment, or a new journalist taking note, I’m not imagining them cutting into some resource that would have been much better spent elsewhere.
- I can imagine extreme versions of setting up AI safety actors for success that include meaningfully sacrificing one’s own prospects of making future contributions to reducing AI risk (e.g., spending all your time helping others learn the basics of the problem when you could just be getting career capital for a senior policy position). I’m not arguing for such extreme trades or a complete handoff to a new group of actors – just more work on the margin.

Concrete calls to action (and vigilance)

Realistically, not everyone in last year's AI alignment community is well-positioned to coordinate with new actors; I think the burden here falls particularly on actors who have legible credibility outside EA and adjacent communities (e.g., a PhD in ML or policy background, working on the alignment team of an AI lab, a position at an AI org with a strong brand, some impressive CV items, etc.)

Some ex-SERI MATS scholar who is all in on theoretical agent foundations work might be best positioned to keep doing just that. But I think everyone can help in the effort of growing the number of people doing useful alignment work (or at least not putting people off!)

Here’s my best guess of concrete things people in last year’s alignment could do:

Don’t put people off AI alignment
- Genuinely engage with criticisms. Don't be dismissive and act like you have all the answers.
- Respect existing power structures. Academics and policy-makers think highly of themselves and like when others do too.
- Recognise your position as an ambassador for an entire community. Whether you like it or not, when you engage with people as someone ostensibly in the EA/alignment/rationality/AI x-risk community – be it on twitter or in conversations – recognize that people are extrapolating many other people’s behavior and worldview from yours.
Seek out constructive relationships
- Success in policy and other areas seems like largely a function of who you know and how many people know and like you – so get to know influential people and make them like you.
Distill existing research
- The format can vary, depending on your expertise: publications, plain-language expertise, long-form videos, short-form videos, new workshops, hackathons etc.
- When distilling research (or just communicating in general) try to limit jargon and speak the language of your audience (e.g., merge alignment terminology with existing ML terminology or policy-speak).
Funnel x-risk motivated AI safety talent into productive places
- Both individuals (and orgs) should try to map out which orgs might make the biggest decisions during the AI ‘endgame’ and try to enter those spaces (or create a talent pipeline to get people into those spaces).
- People in last year's alignment community seem like they have a head start, which is good news! But it seems likely that less X-risk motivated talent will enter in the coming years.
Guide the research paradigm
- I’m not sure how intellectual progress gets made in fields like AI alignment (this is an empirical question I haven’t researched) but my guess is that it relies heavily on progressively better frameworks, taxonomies, exposing of assumptions, etc.
- As one of the communities that has thought about the problem for the longest, we may have a strong comparative advantage for this work.
Boost the right memes
- Certain memes (bite-sized ideas that can some people describe as having a life of their own) seem much more contagious than others, and it’ll be important to track which are more palatable to the people one is hoping to convince. E.g., “P(doom)” is an expression that might best be left in last year’s alignment community...
Become the experts in the room
- Where possible, I think people in last year’s alignment community should try to leverage the authority and raw expertise that comes with having thought about this longer than most.
- My guess is that, on the margin, some people working on AI risk who never expected to become public-facing and don’t love the idea should now consider becoming public facing.
Patch the gaps that others won’t cover
- E.g., if more academics start doing prosaic alignment work, then ‘big-if-true’ theoretical work may become more valuable, or high-quality work on digital sentience.
- There’s probably predictable ‘market failures’ in any discipline – work that isn’t sexy but still very useful (e.g., organizing events, fixing coordination problems, distilling the same argument into new language, etc.).
- Generally track whether top priority work is getting covered (e.g., information security, standards and monitoring)
Try to understand and learn from newcomers' reactions
- When talking to people just taking interest in AI risk, I think there’s a danger of not appropriately registering their counterarguments. Even if you’re confident you’re right, you’ll have a much more productive relationship if you meet them where they are at and address common confusions early on.
- There will be people taking an interest in this problem from perspectives that haven’t contributed to AI safety so far, some of whom will likely be in a position to make novel contributions. Hear those out.

Closing

As the rest of the world watches the AI risk discourse unfold, now strikes me as a time to let the virtues of EA show: a clear-eyed focus on working for the betterment of others, a relentless focus on what is true of the world, and the epistemic humility to know we may well be wrong. To anyone trying to tackle a difficult problem where the fate of humanity may be at stake, we should be valuable allies; we are on the same side.

I’m curious whether others agree with this post's argument, or whether they think there are better frames for orienting to the influx of attention on AI risk.

Thank you to Lara Thurnherr for comments on a draft of this post, and Nicholas Dupuis for the conversation that spurred me to get writing. All views are my own.

ChanaMessingerAug 2 20234

I liked this!

I appreciated that for the claim I was most skeptical of: "There’s also the basic intuition that more people with new expertise working on a hard problem just seems better", my skepticism was anticipated and discussed.

For me one of the most important things is:

Patch the gaps that others won’t cover
E.g., if more academics start doing prosaic alignment work, then ‘big-if-true’ theoretical work may become more valuable, or high-quality work on digital sentience.
There’s probably predictable ‘market failures’ in any discipline – work that isn’t sexy but still very useful (e.g., organizing events, fixing coordination problems, distilling the same argument into new language, etc.).
Generally track whether top priority work is getting covered (e.g., information security, standards and monitoring)

This, plus avoiding and calling out safety washing, keeping an eye out for overall wrongheaded activities and motions (for instance, probably a lot of regulation is bad by default and some could actively make things worse), seem like the strongest arguments against making big naive shifts because the field is in a broad sense less neglected.

More generally, I think a lot of the details of what kinds of engagements we have with the broader world will matter (and I think in many cases "guide" will be a less accurate description of what's on the table than "be one of the players in the room", and some might be a lot more impactful than others, but I don't have super fleshed out views on which yet!

Effective Altruism Forum
EA Forum