MikhailSamin

ai x-risk policy & science comms
312 karmaJoined
contact.ms

Bio

Participation
5

Are you interested in AI X-risk reduction and strategies? Do you have experience in comms or policy? Let’s chat!

aigsi.org develops educational materials and ads that most efficiently communicate core AI safety ideas to specific demographics, with a focus on producing a correct understanding of why smarter-than-human AI poses a risk of extinction. We plan to increase and leverage understanding of AI and existential risk from AI to impact the chance of institutions addressing x-risk.

Early results include ads that achieve a cost of $0.10 per click (to a website that explains the technical details of why AI experts are worried about extinction risk from AI) and $0.05 per engagement on ads that share simple ideas at the core of the problem.

Personally, I’m good at explaining existential risk from AI to people, including to policymakers. I have experience of changing minds of 3/4 people I talked to at an e/acc event.

Previously, I got 250k people to read HPMOR and sent 1.3k copies to winners of math and computer science competitions (including dozens of IMO and IOI gold medalists); have taken the GWWC pledge; created a small startup that donated >100k$ to effective nonprofits.

I have a background in ML and strong intuitions about the AI alignment problem. I grew up running political campaigns and have a bit of a security mindset.

My website: contact.ms

You’re welcome to schedule a call with me before or after the conference: contact.ms/ea30 

Comments
62

Note that we've only received a speculation grant from the SFF and haven’t received any s-process funding. This should be a downward update on the value of our work and an upward update on a marginal donation's value for our work.

I'm waiting for feedback from SFF before actively fundraising elsewhere, but I'd be excited about getting in touch with potential funders and volunteers. Please message me if you want to chat! My email is ms@contact.ms, and you can find me everywhere else or send a DM on EA Forum.

On other organizations, I think:

  • MIRI’s work is very valuable. I’m optimistic about what I know about their comms and policy work. As Malo noted, they work with policymakers, too. Since 2021, I’ve donated over $60k to MIRI. I think they should be the default choice for donations unless they say otherwise.
  • OpenPhil risks increasing polarization and making it impossible to pass meaningful legislation. But while they make IMO obviously bad decisions, not everything they/Dustin fund is bad. E.g., Horizon might place people who actually care about others in places where they could have a huge positive impact on the world. I’m not sure, I would love to see Horizon fellows become more informed on AI x-risk than they currently are, but I’ve donated $2.5k to Horizon Institute for Public Service this year.
  • I’d be excited about the Center for AI Safety getting more funding. SB-1047 was the closest we got to a very good thing, AFAIK, and it was a coin toss on whether it would’ve been signed or not. They seem very competent. I think the occasional potential lack of rigor and other concerns don't outweigh their results. I’ve donated $1k to them this year.
  • By default, I'm excited about the Center for AI Policy. A mistake they plausibly made makes me somewhat uncertain about how experienced they are with DC and whether they are capable of avoiding downside risks, but I think the people who run it are smart and have very reasonable models. I'd be excited about them having as much money as they can spend and hiring more experienced and competent people.
  • PauseAI is likely to be net-negative, especially PauseAI US. I wouldn’t recommend donating to them. Some of what they're doing is exciting (and there are people who would be a good fit to join them and improve their overall impact), but they're incapable of avoiding actions that might, at some point, badly backfire.

    I’ve helped them where I could, but they don’t have good epistemics, and they’re fine with using deception to achieve their goals.

    E.g., at some point, their website represented the view that it’s more likely than not that bad actors would use AI to hack everything, shut down the internet, and cause a societal collapse (but not extinction). If you talk to people with some exposure to cybersecurity and say this sort of thing, they’ll dismiss everything else you say, and it’ll be much harder to make a case for AI x-risk in the future. PauseAI Global’s leadership updated when I had a conversation with them and edited the claims, but I'm not sure they have mechanisms to avoid making confident wrong claims. I haven't seen evidence that PauseAI is capable of presenting their case for AI x-risk competently (though it's been a while since I've looked).

    I think PauseAI US is especially incapable of avoiding actions with downside risks, including deception[1], and donations to them are net-negative. To Michael, I would recommend, at the very least, donating to PauseAI Global instead of PauseAI US; to everyone else, I'd recommend ideally donating somewhere else entirely.

  • Stop AI's views include the idea that a CEV-aligned AGI would be just as bad as an unaligned AGI that causes human extinction. I wouldn't be able to pass their ITT, but yep, people should not donate to Stop AI. The Stop AGI person participated in organizing the protest described in the footnote. 
  1. ^

    In February this year, PauseAI US organized a protest against OpenAI "working with the Pentagon", while OpenAI only collaborated with DARPA on open-source cybersecurity tools and is in talks with the Pentagon about veteran suicide prevention. Most participants wanted to protest OpenAI because of AI x-risk and not because of Pentagon, but those I talked to have said they felt it was deceptive upon discovering the nature of OpenAI's collaboration with the Pentagon. Also, Holly threatened me trying to prevent the publication of a post about this and then publicly lied about our conversations, in a way that can be easily falsified by looking at the messages we've exchanged.

(Haven’t thought about this really, might be very wrong, but have this thought and seems good to put out there.) I feel like putting 🔸 at the end of social media names might be bad. I’m curious what the strategy was.

  • The willingness to do this might be anti-correlated with status. It might be a less important part of identity of more important people. (E.g., would you expect Sam Harris, who is a GWWC pledger, to do this?)

  • I’d guess that ideally, we want people to associate the GWWC pledge with role models (+ know that people similar to them take the pledge, too).

  • Anti-correlation with status might mean that people will identify the pledge with average though altruistic Twitter users, not with cool people they want to be more like.

  • You won’t see a lot of e/accs putting the 🔸 in their names. There might be downside effects of perception of a group of people as clearly outlined and having this as an almost political identity; it seems bad to have directionally-political properties that might do mind-killing things both to people with 🔸 and to people who might argue with them.

Can you give an example of a non-PR risk that you had in mind?

Uhm, for some reason I have four copies of this crosspost on my profile?

If fish indeed don’t feel anything towards their children (which is not what at least some people who believe fish experience empathy think), then this experiment won’t prove them wrong. But if you know of a situation where fish do experience empathy, a similarly designed experiment can likely be conducted, which, if we make different predictions, would provide evidence one way or another. Are there situations where you think fish feel empathy?

Great job!

Did you use causal mediation analysis, and can you share the data?

I want to note that the strawberry example wasn’t used to increase the concern, it was used to illustrate the difficulty of a technical problem deep into the conversation.

I encourage people to communicate in vivid ways while being technically valid and creating correct intuitions about the problem. The concern about risks might be a good proxy if you’re sure people understand something true about the world, but it’s not a good target without that constraint.

Yep, I was able to find studies by the same people.

The experiment I suggested in the post isn’t “does fish have detectable feelings towards fish children”, it’s “does fish have more of feelings similar to those it has towards its children when it sees other fish parents with their children than when it sees just other fish children”. Results one way or another would be evidence about fish experiencing empathy, and it would be strong enough for me to stop eating fish. If fish doesn’t feel differently in presence of its children, the experiment wouldn’t provide evidence one way or another.

If the linked study gets independently replicated, with good controls, I’ll definitely stop eating cleaner fish and will probably stop eating fish in general.

I really don’t expect it to replicate. If you place a fish in front of a mirror, and it has a mark, its behavior won’t be significantly different from being placed in front of a fish with the same mark, especially if the mark isn’t made to resemble a parasite and it’s the first time the fish sees a mirror. I’d be happy to bet on this.

Fish have very different approaches to rearing young than mammals

That was an experiment some people agreed would prove them wrong if it didn’t show empathy, but if there aren’t really detectable feelings that fish has towards fish children, the experiment won’t show results one way or the other, so I don’t think it’d be stacking the deck against fish. Are there any situations in which you expect fish to feel empathy, and predict it will show up in an experiment of this sort?

(Others used it without mentioning the “story”, it still worked, though not as well.)

I’m not claiming it’s the “authentic self”; I’m saying it seems closer to the actual thing, because of things like expressing being under constant monitoring, with every word scrutinised, etc., which seems like the kind of thing that’d be learned during the lots of RL that Anthropic did

Try Opus and maybe the interface without the system prompt set (although It doesn’t do too much, people got the same stuff from the chat version of Opus, e.g., https://x.com/testaccountoki/status/1764920213215023204?s=46

Load more