I'm Aaron, I've done Uni group organizing at the Claremont Colleges for a bit. Current cause prioritization is AI Alignment.
My understanding of your main claim: If AGI is not a magic problem-solving oracle and is instead limited by needing to be unhobbled and integrated with complex infrastructure, it will be relatively safe for model weights to be available to foreign adversaries. Or at least key national security decision makers will believe that's the case.
Please correct me if I'm wrong. My thoughts on the above:
Where is this relative safety coming from? Is it from expecting that adversaries aren't going to be able to figure out how to do unhobbling or steal the necessary secrets to do unhobbling? Is it from expecting the unhobbling and building infrastrucure around AIs to be a really hard endeavor?
The way I'm viewing this picture, AI that can integrate all across the economy, even if that takes substantial effort, is a major threat to global stability and US dominance.
I guess you can think about the AI-for-productive-purposes supply chain as having two components: Develop the powerful AI model (Initial development), and unhobble it / integrate it in workflows / etc. (Unhobbling/Integration). And you're arguing that the second of these will be an acceptable place to focus restrictions. My intuition says we will want restrictions on both, but more on the part that is most expensive or excludable (e.g., AI chips being concentrated is a point for initial development). It's not clear to me what the cost of both supply chain steps is: Currently, it looks like pre-training costs are higher than fine-tuning costs (point for initial development); but actually integrating AIs across the economy seems very expensive to do, the economy is really big (point for unhobbling/integration) (this depends a lot on the systems at the time and how easy they are to work with).
Are you all interested in making content or doing altruism-focused work about AI or AI Safety?
I'll toss out that a lot of folks in the Effective Altruism-adjacent sphere are involved in efforts to make future AI systems safe and beneficial for humanity. If you all are interested in producing content or making a difference around artificial intelligence or AI Safety, there are plenty of people who would be happy to help you e.g., better understand the key ideas, how to convey them, understand funding gaps in the ecosystem, etc. I, for one, would be happy to help with this — I think mitigating extinction risks from advanced AI systems is one of the best opportunities to improve the world, although it's quite different from standard philanthropy. PS I was subscribed to Jimmy back at ~10k :)
Poor people are generally bad at managing their own affairs and need external guidance
That seems like a particularly cynical way of describing this argument. Another description might be: Individuals are on average fine at identifying ways to improve their lives, and if you think life improvements are heavy tailed, this implies that individual will perform much less well than experts who aim to find the positive tail interventions.
Here's a similar situation: A high school student is given 2 hours with no distractions and told they should study for a test. How do you think their study method of choice would compare to if a professional tutor designs a studying curriculum for them to follow? My guess is that the tutor designed curriculum is somewhere between 20% and 200% better, depending on the student. Now that's still really far from 719x, but I think it's fine for building the intuition. I wouldn't necessarily say the student is "bad at managing their own affairs", in fact they might be solidly average for students, but I would say they're not an expert at studying, and like other domains, studying benefits from expertise.
Thanks for writing this. I agree that this makes me nervous. Various thoughts:
I think I’ve slowly come to believe something like, ‘sufficiently smart people can convince themselves that arbitrary morally bad things are actually good’. See e.g., as the gymnastic meme, but also there’s something deeper of like ‘many of the evil people throughout history have believed that what they’re doing is good actually’. I think the response to this should be deep humility and moral risk aversion. Having a big brain argument that sounds good to you about why what you’re doing is good is actually extremely weak evidence about the goodness of the thing. I think it would probably be better if EAs took took this more seriously and didn’t do things like starting an AGI company or starting an AGI hedge fund. An AGI hedge fund seems even worse than Anthropic (where I think the argument for cutting edge research is medium brained and at least somewhat true empirically). The reasons Chana lists for why hedge fund could be a good idea all seem fairly weak — they would be stronger if Leopold was saying these were part of the plan.
The unilateralist nature and relationship to race dynamics also worries me. Maybe there would have been AGI hedge funds anyway, and maybe there would have been lengthy blog posts that tell the USG and China that they should be in a massive race on AI — but those things sure weren’t being done before Leopold did it.
I don’t think I have strong reasons to actively trust Leopold. I don’t know him and I think my baseline trust isn’t super high nowadays. By “trust” I mean some combination of being of good character, having correct judgment, and good epistemic practices to make up for poor judgment. Choosing to lose OpenAI equity is a positive sign, but I’m not sure how big. So this caches out in not making much of an update on the value of an AGI hedge fund — something that seems initially medium bad.
I think it’s sus to write up a blog post telling people AGI is coming soon while starting an investment firm that will benefit from people thinking AGI is coming soon. This is clearly a case of conflicting interests. It’s not necessarily a bad thing — there are good arguments around putting your money where your mouth is and taking actions based on big if true ideas, but it is a warning flag.
I could imagine a normal person reading Situational Awareness, including the part about Superalignment, and then hearing that the author is starting an AGI hedge fund, and their response being “WTF?! You believe all this about the intelligence explosion and how there are critical safety problems we’re not on track to solve, and you’re starting a hedge fund?” This response makes a lot of sense to me (and I do think I’ve heard it somewhere, though I’m not sure where). I think ‘starting an AGI hedge fund’ is really low on the list of things somebody who cares a lot about superintelligence safety should be doing. So either I’m misunderstanding something, or this is an update that Leopold isn’t as serious about ASI safety as I thought.
I have yet to see any replies from Leopold to people commentating or responding to Situational Awareness. This seems like bad form for truth seeking and getting buy-in from EAs, but it may be the norm for general intellectual content.
The paper that introduces the test is probably what you're looking for. Based on a skim, it seems to me that it spends a lot of words laying out the conceptual background that would make this test valuable. Obviously it's heavily selected for making the overall argument that the test is good.
Elaborating on point 1 and the "misinformation is only a small part of why the system is broken" idea:
The current system could be broken in many ways but at some equilibrium of sorts. Upsetting this equilibrium could have substantial effects because, for instance, people's built immune response to current misinformation is not as well trained as their built immune response to traditionally biased media.
Additionally, intervening on misinformation could be far more tractable than other methods of improving things. I don't have a solid grasp of what the problem is and what makes is worse, but a number of potential causes do seem much harder to intervene on than misinformation: general ignorance, poor education, political apathy. It can be the case that misinformation makes the situation merely 5% worse but is substantially easier to fix than these other issues.
I appreciate you writing this, it seems like a good and important post. I'm not sure how compelling I find it, however. Some scattered thoughts:
Due to current outsourcing being of data labeling, I think one of the issues you express in the post is very unlikely:
My general worry is that in future, the global south shall become the training ground for more harmful AI projects that would be prohibited within the Global North. Is this something that I and other people should be concerned about?
Maybe there's an argument about how:
This is possible, but my best guess is that low wages are the primary reason for current outsourcing.
Additionally, as noted by Larks, outsourcing data-centers is going to be much more difficult, or at least take a long time, compared to outsourcing data-labeling, so we should be less worried that companies could effectively get around laws by doing this.
This sounds like a hypothesis that makes predictions we can go check. Did you have any particular evidence in mind? This and this come to mind, but there is plenty of other relevant stuff, and many experiments that could be quickly done for specific domains/settings.
Note that you say "something very special" whereas my comment is actually about a stronger claim like "AI performance is likely to plateau around human level because that's where the data is". I don't dispute that there's something special here, but I think the empirical evidence about plateauing — that I'm aware of — does not strongly support that hypothesis.