NN

Neel Nanda

4253 karmaJoined neelnanda.io

Bio

I lead the DeepMind mechanistic interpretability team

Comments
322

Seems reasonable (tbh with that context I'm somewhat OK with the original ban), thanks for clarifying!

when they're considering buying mansions in the Oxford countryside/other controversial multimillion dollar calculations, publishing the cost-benefit calculation rather than merely asserting its existence

Huh? That wasn't CEAs decision, they just fiscally sponsored Wytham

1 is very true, 2 I agree with apart from the word main, it seems hard to label any factor as "the main" thing, and there's a bunch of complex reasoning about counterfactuals - eg if GDM stopped work that wouldn't stop Meta, so is GDM working on capabilities actually the main thing?

I'm pretty unconvinced that not sharing results with frontier labs is tenable - leaving aside that these labs are often the best places to do certain kinds of safety work, if our work is to matter, we need the labs to use it! And you often get valuable feedback on the work by seeing it actually used in production. Having a bunch of safety people who work in secret and then unveil their safety plan at the last minute seems very unlikely to work to me

I personally think that "does this advance capabilities" is the wrong question to ask, and instead you should ask "how much does this advance capabilities relative to safety". Safer models are just more useful, and more profitable a lot of the time! Eg I care a lot about avoiding deception. But honest models are just generally more useful to users (beyond white lies I guess). And I think it would be silly for no one to work on detecting or reducing deception. I think most good safety work will inherently advance capabilities in some sense, and this is a sign that it's actually doing anything real. I struggle to think of any work I think is both useful and doesn't advance capabilities at all

Ah, thanks, that's important context - I semi-retract my strongly worded comment above, depending on exactly how bad the removed post was, but can imagine posts in this genre that I think are genuinely bad

Strong +1 to Richard, this seems a clear incorrect moderation call and I encourage you to reverse it.

I'm personally very strongly opposed to killing people because they eat meat, and the general ethos behind that. I don't feel in the slightest offended or bothered by that post, it's just one in a string of hypothetical questions, and it clearly is not intended as a call to action or to encourage action.

If the EA Forum isn't somewhere where you can ask a perfectly legitimate hypothetical question like that, what are we even doing here?

I think this is a valid long term concern but takes at least a few months to properly propagate - if someone qualified tells you that when hiring they look at a github profile, that's probably pretty good for the duration of your job search

I made a reasonably large donation to LTFF at the time of the match, and it felt very clear to me exactly what the situation was, that the matching funds were questionably counterfactual, and felt like just a small bonus to me. I thought the comms there were good.

I imagine you can get a lot of the value here more cheaply by reaching out to people in the field and asking them a bunch of questions about what signals do and do not impress them?

Doing internships etc is valuable to get the supervision to DO the impressive projects, of course.

EDIT: Speaking as someone who does hiring of interpretability researchers, I think there's a bunch of signals I look for and ones I don't care about, and sometimes people new to the field have very inaccurate guesses here

Ah, fair. Yes, I agree that's a plausible factor, especially for nicher areas

Load more