If you work for a frontier AI company, either because you think they care about saving the world or especially if you think that you will be the one to influence them, you are deluded. Wake up and quit. 

If you care about protecting the world, you will quit, even though it will be hard to give up the money and the prestige and the hope that they would fix the problem. The actual path to reducing AI risk is not as glamorous or as clear at this point as following the instructions of a wealthy and well-organized corporation, but at least you will be going in the right direction. 

The early 80k-style advice to work at an AI lab was mainly to make technical discoveries for safety that e.g. academia didn't have the resources for. When they were small, it also made some sense to try to influence the industry culture. Now, this advice is crazy-- there is no way 1 EA joining a 1000 person company with duties to their investors and locked in a death race is going to "influence" it. The influence goes entirely the other way. If you weren't frogboiled, you would never have selected this path for influence.

There's a lot more to say on this, but I think this is the crux. Your chance for positive marginal impact for AI Safety is not with the labs. If you work for the labs, you're probably just a henchman for a supervillain megaproject, and you can have some positive counterfactual impact right now by quitting. Don't sell out.

2

12
35

Reactions

12
35
Comments23
Sorted by Click to highlight new comments since:

I downvoted this (but have upvoted some of your comments).

I think this advice is at minimum overstated, and likely wrong and harmful (at least if taken literally). And it's presented with rhetorical force, so that it seems to mostly be trying to push people's views towards a position that is (IMO) harmful, rather than mostly providing them with information to help them come to their own conclusions.

TBC:

  • I think you probably have things to add here, and in particular feel quite curious what's led you to the view that people here inevitably get corrupted (which doesn't match my impression), or how you think that corruption manifests
  • I'm in favour of people having access to the "henchman of a supervillain" perspective (which could help them to notice things they might otherwise overlook); the thing I'm objecting to is rhetorically projecting it as the deep truth of the situation (which I think it isn't)
Jason
41
12
0
4
3

I don't have an opinion on whether Holly is correct that no one should work for the labs. But even for those who disagree, there are some implied hypotheses here that are worth pondering:

  • People systematically underestimate how much money, power, and prestige will influence their beliefs and their judgment.
  • People systematically overestimate how much influence they have on others and underestimate how much influence others have on them. Editorializing on my own, I suspect that almost everyone thinks of themselves as a net influencer, but net amount of [influence on others - influence by others] in a system seemingly has to be zero.

If people decide to work in a frontier lab anyway, to what extent can they mitigate the risk of being "frogboiled" by

  • having a plan to evaluate -- as objectively as possible -- whether they are being influenced in the ways Holly describes. (What would this look like?);
  • living well beneath the AI-lab salary and chipmunking away most of the excess, reducing the risk that they will feel psychological pressure to continue with a lab to maintain their standard of living;
  • going out of their way to ensure enough of their social lives / support is independent of the lab, so that their desire to maintain that support will not lead them to stay with the lab if that no longer seems wise;
  • publicly commit to yellow lines under which they would seriously consider reversing course, and red lines under which they would pre-commit to doing so;
  • or something else?

(I'm open to the response that there are no meaningful detection and/or mitigation techniques.)

In my view, there are many good reasons to work at an AI company, including:
* productively steering an AI lab during crunch time
* doing well-resourced AI safety research
* increasing the ability for safety-conscious people to blow the whistle to governments
* learning about the AI frontier from the best people in the field
* giving to effective charities
* influencing the views of other employees
* influencing how powerful AI systems are deployed and what they are used for during deployment

I don't think these necessarily outweigh the costs of working at an AI company, but the altruistic benefits are sometimes large, and it seems good for people to consider the option thoughtfully.
 

Can you list what you see as the costs?

Can you explain why you think doing safety work at these places is bad?

I think there was a time when it seemed like a good idea, back when the companies were small and there was more of a chance of setting their standards and culture. Back in 2016 I thought on balance we should try to put Safety people in OpenAI, for instance. OpenAI was supposed to be explicitly Safety-oriented, but any company's safety division seemed like it might pay off to stock with Safety people. 

I think everything had clearly changed around the chatGPT moment. The companies had a successful paradigm for making the models, the product was extremely valuable, and the race was very clearly on. At this time, EAs still believed that OpenAI and Anthropic were on their side because they had Safety teams (including many EAs) and talked a lot about Safety, in fact claiming to be developing AGI for the sake of Safety. Actual influence from EA employees to do things that were safe that weren't good for the mission of those companies was already lost at this point, imo. 

It was proven in the ensuing two years that the Safety teams at OpenAI were expendable. Sam Altman has used up and thrown away EA, and he no longer feels any need to pretend OpenAI cares about Safety, despite having very fluently talked to the talk for years before. He was happy to use the EA board members and the entire movement as scapegoats.

Anthropic is showing signs of going the same way. They do Safety research, but nothing stops them developing further, including former promises not to advance the frontier. The main thing they do is develop bigger and bigger models. They want to be attractive to natsec, and whether the actual decisionmakers at the top ultimately believe their agenda is for the sake of Safety or not, it's clearly not up to the marginal Safety hire or hingeing on their research results. Other AI companies don't even claim to care about Safety particularly.

So, I do not think it is effective to work at these places. But the real harm is that working for AI labs keeps EAs from speaking out about AI danger, whether because they are under NDA, or because they want to be hireable by a lab, or they want to cooperate with people working at labs, or because they defer to their friends and general social environment and so they think the labs are good (at least Anthropic). imo this price is unacceptably high, and EAs would have a lot more of the impact they hoped to get from being "in the room" at labs by speaking out and contributing to real external pressure and regulation. 

I agree that there could be an effect that keeps people from speaking out about AI danger. But:

  • I think that such political incentives can occur whenever anyone is dealing with external power-structures, and in practice my impression is that these are a bigger deal for people who want jobs in AI policy compared to people engaged with frontier AI companies
  • This argument has most force in arguing that some EAs should keep professional and social distance from frontier AI companies, not that everyone should
  • Working at a frontier AI company (or having worked at one) can give people a better platform to talk about these issues!
    • Both because of giving people deeper expertise (so they are actually more informed on key questions), but also because of making that legible to the outside world
    • For instance, I feel better about GDM publishing their recent content on safety and security than not, and I think the paper would have had much less impact on public discourse if it had come from an unaffiliated group

Probably our crux is that I think the way society sees AI development morally is what matters here to navigate the straits, and the science is not going to be able to do the job in time. I care about developing a field of technical AI Safety but not if it comes at the expense of moral clarity that continuing to train bigger and bigger models is not okay before we know it will be safe. I would much rather rally the public to that message than try to get in the weak safety paper discourse game (which tbc I consider toothless and assume is not guiding Google’s strategy).

I agree there are some possible attitudes that society could have towards AI development which could put us in a much safer position.

I think that the degree of consensus you'd need for the position that you're outlining here is practically infeasible, absent some big shift in the basic dynamics. I think that the possible shifts which might get you there are roughly:

  1. Scientific ~consensus -- people look to scientists for thought leadership on this stuff. Plausibly you could have a scientist-driven moratorium (this still feels like a stretch, but less than just switching the way society sees AI without having the scientists leading that)
  2. Freak-out about everyday implications of AI -- sufficiently advanced AI would not just pose unprecedented risks, but also represent a fundamental change in the human condition. This could drive a tide of strong sentiment, that doesn't rely on abstract arguments about danger.
  3. Much better epistemics and/or coordination -- out of reach now, put potentially obtainable with stronger tech.

I think there's potentially something to each of these. But I think the GDM paper is (in expectation) actively helpful for 1 and probably 3, and doesn't move the needle much either way on 2.

(My own view is that 3 is the most likely route to succeed. There's some discussion of the pragmatics of this route in AI Tools for Existential Security or AI for AI Safety (both of which also discuss automation of safety research, which is another potential success route), and relevant background views on the big-picture strategic situation in the Choice Transition. But I also feel positive about people exploring routes 1 and 2.)

Much better epistemics and/or coordination -- out of reach now, put potentially obtainable with stronger tech.

Why are these the same category and why are you writing coordination off as impossible? It's not. We have literally done global nonproliferation treaties before.

This bizarre notion got embedded early in EA that technological feats are possible and solving coordination problems is impossible. It's actually the opposite-- alignment is not tractable and coordination is.

These are in the same category because:

  • I'm talking about game-changing improvements to our capabilities (mostly via more cognitive labour; not requiring superintelligence)
  • These are the capacities that we need to help everyone to recognize the situation we're in and come together to do something about it (and they are partial substitutes: the better everyone's epistemics are, the less need for a big lift on coordination which has to cover people seeing the world very differently) 

I'm not actually making a claim about alignment difficulty -- beyond that I do think systems in the vein of those today and the near-successors of those look pretty safe. 

I think that getting people to pause AI research would be a bigger lift than any nonproliferation treaties we've had in the past (not that such treaties have always been effective!). This isn't just a military tech, it's a massively valuable economic tech. Given the incentives, and the importance of having treaties actually followed, I do think this would be a more difficult challenge than any past nonproliferation work. I don't think that means it's impossible, but I do think it's way more likely if something shifts -- hence my 1-3.

(Or if you were asking why I say "out of reach now" in the quoted sentence it's because I'm literally talking about "much better coordination" as a capability; not what could or couldn't be achieved with a certain level of coordination.)

I will answer comments that ask sincerely for explanations of my worldview on this. I am aware there is a lot of evidence listing and dot-connecting I didn't do here. 

What do you think would happen at the frontier labs if EAs left their jobs en masse? I understand the view that the newly-departed would be more able to "speak[] out and contribut[e] to real external pressure and regulation." And I understand the view that the leadership isn't listening to safety-minded EAs anyway. 

But there are potential negative effects from the non-EAs who would presumably be hired as replacements. On your view, could replacement hiring make things worse at the labs? If so, how do you balance that downside risk against the advantages of departing the labs?

I think almost nothing would change at the labs, but that the EA AI Safety movement would become less impotent, more clear, and stand more of a chance of doing good.

No, I do not expect the people who replace them (or them not being replaced) to have much of an effect. I do not think they are really helping and I don’t think their absence would really hurt. The companies are following their own agenda and they’ll do that with or without specifc people in those roles.

If these people weren't really helping the companies it seems surprising salaries are so high?

I think Holly’s claim is that these people aren’t really helping from an ‘influencing the company to be more safety conscious’ perspective, or a ‘solving the hard parts of the alignment problem’ perspective. They could still be helping the company build commercially lucrative AI.

Would you say that investing in frontier AI companies (as an individual with normal human levels of capital) is similarly bad?

I think it is hazardous bc it in some way ties their "success" to yours. 

Thanks for the post, Holly. Strongly upvoted. I did not find the post that valuable per se, but it generated some good discussion.

If you care about protecting the world, you will quit, even though it will be hard to give up the money and the prestige and the hope that they would fix the problem.

People at leading AI companies can earn hundreds of thousand of dollars per year, so quitting could plausibly decrease their donations by 100 k$/year. I estimate donating this to the Shrimp Welfare Project (SWP) would decrease as much pain per year as that needed to neutralise the happiness of 1.25 M human lives (= 100*10^3*639/51). Do you think the benefits of quitting outweight this? I do not, so I encourage people at leading AI companies to simply donate more to SWP. I imagine no one would quit if there were actual human lives on the line (instead of shrimp which are not helped).

I'm not sure I understand the question. If it's about being able to give donations, I wouldn't worry bc these people can be employed elsewhere making comparable salaries. 

I think expected future earnings, including salaries and appreciation of equity, would go down in most cases. I thought you would agree because you said "even though it will be hard to give up the money".

I wasn't imagining they were donating the money, frankly. I'm not sure how many people working at AI companies even donate.

Anyway, directly making the world worse is not the only choice for making money.

Curated and popular this week
Relevant opportunities