Former software developer and investor. Now working on several projects to reduce xrisk from AI.
If you're interested in how we can usefully spend > a trillion dollars per year on AI safety, reach out.
Thanks for the feedback!
Regarding (a), it doesn't seem clear to me that conditional on Impact List being wildly successful (which I'm interpreting as roughly the $110B over ten years case), we shouldn't expect it to account for more than 10% of overall EA outreach impact. Conditional on Impact List accounting for $110B, I don't think I'd feel surprised to learn that EA controls only $400B (or even $200B) instead of ~$1T. Can you say more about why that would be surprising?
(I do think there's a ~5% chance that EA controls or has deployed $1T within ten years.)
I think (b) is a legit argument in general, although I have a lot of uncertainty about what the appropriate discount should be. This is also highlighting that using dollars for impact can be unclear, and that my EV calculation bucketed money as either 'ineffective' or 'effective' without spelling out the implications.
A few implications of that:
Given the bucketing and that "$X of value" doesn't mean "$X put into the most effective cause area", I think it may be reasonable to not have a discount. Not having a discount assumes that we'll find enough (or scalable enough) cause areas over the next ten years at least as effective as whatever threshold value we pick that they can soak up an extra ~110B. Although this is probably a lot more plausible to those who prioritize x-risk than to those who think global health will be the top cause area over that period.
Yeah it will be very time intensive.
When we evaluate people who don't make the list, we can maintain pages for them on the site showing what we do know about their donations, so that a search would surface their page even if they're not on the list. Such a page would essentially explain why they're not on the list by showing the donations we know about and which recipients we've evaluated vs. those who we've assigned default effectiveness values for their category.
I think we can possibly offload some of the research work on people who think we're wrong about who is on the list, by being very willing to update our data if anyone sends us credible evidence about any donation that we missed, or persuasive evidence about the effectiveness of any org. The existence of donations seems way easier to verify than to discover. Maybe the potential list-members themselves would send us a lot of this data from alt accounts.
I think Impact List does want to present itself as a best-effort attempt at being comprehensive. We'll acknowledge that of course we've missed things, but that it's a hard problem and no one has come close to doing it better. Combined with our receptivity to submitted data, my guess is that most people would be OK with that (conditional on them being OK with how we rank people who are on the list).
I think even on EA's own terms (apart from any effects from EA being fringe) there's a good reason for EAs to be OK with being more stressed and unhappy than people with other philosophies.
On the scale of human history we're likely in an emergency situation when we have an opportunity to trade off the happiness of EAs for enormous gains in total well-being. Similar to how during a bear attack you'd accept that you won't feel relaxed and happy while you try to mitigate the attack, but this period of stress is worth it overall. This is especially true if you believe we're in the hinge of history.
I also mention this in my response to your other comment, but in case others didn't notice that: my current best guess for how we can reasonably compare across cause areas is to use something like WALYs. For animals my guess is we'll adjust WALYs with some measure of brain complexity.
In general the rankings will be super sensitive to assumptions. Through really high quality research we might be able to reduce disagreements a little, but no matter what there will still be lots of disagreements about assumptions.
I mentioned in the post that the default ranking might eventually become some blend of rankings from many EA orgs. Nathan has a good suggestion below about using surveys to do this blending. A key point is that you can factor out just the differences in assumptions between two rankings and survey people about which assumptions they find most credible.
I think you highlight something really important at the end of your post about the benefit of making these assumptions explicit.
Agreed. I guess my intuition is that using WALYs for humans+animals (scaled for brain complexity), humans only, and longtermist beings will be a decent enough approximation for maybe 80% of EAs and over 90% of the general public. Not that it's the ideal metric for these people, but good enough that they'd treat the results as pretty important if they knew the calculations were done well.
Yeah, I've lately been considering just three options for moral weights: 'humans only', 'including animals', and 'longtermist', with the first two being implicitly neartermist.
It seems like we don't need 'longtermist with humans only' and 'longtermist including animals' because if things go well the bulk of the beings that exist in the long run will be morally relevant (if they weren't we would have replaced them with more morally relevant beings).
Hi Ben. I just read the transcript of your 80,000 Hours interview and am curious how you'd respond to the following:
Analogy to agriculture, industry
You say that it would be hard for a single person (or group?) acting far before the agricultural revolution or industrial revolution to impact how those things turned out, so we should be skeptical that we can have much effect now on how an AI revolution turns out.
Do you agree that the goodness of this analogy is roughly proportional to how slow our AI takeoff is? For instance if the first AGI ever created becomes more powerful than the rest of the world, then it seems that anyone who influenced the properties of this AGI would have a huge impact on the future.
Brain-in-a-box
You argue that if we transition more smoothly from super powerful narrow AIs that slowly expand in generality to AGI, we'll be less caught off guard / better prepared.
It seems that even in a relatively slow takeoff, you wouldn't need that big of a discontinuity to result in a singleton AI scenario. If the first AGI that's significantly more generally intelligent than a human is created in a world where lots of powerful narrow AIs exist, wouldn't having a super smart thing at the center of control of a bunch of narrow AI tools plausibly be way more powerful than having human brains at the center of that control?
It seems plausible that in a "smooth" scenario the time between when the first group created AGI and the second group creating an equally powerful one could be months apart. Do you think a months-long discontinuity is not enough for an AGI to pull sufficiently ahead?
Even if multiple groups create AGIs within a short time, isn't having a bunch of unaligned AGIs all trying to get power at the same time also an existential risk? It doesn't seem clear that they'd automatically keep each other in check. One might simply be better at growing or better at sabotaging other AIs. Or if they reach a stalemate they might start cooperating with each other to achieve unaligned goals as a compromise.
Maybe narrow AIs will work better
You say that since today's AIs are narrow, and since there's often benefit in specialization, maybe in the future specialized AIs will continue to dominate. You say "maybe the optimal level of generality actually isn’t that high."
My model is: if you have a central control unit (a human brain, or group of human brains) who is deciding how to use a bunch of narrow AIs, then if you replace that central control unit with one that it more intelligent / fast acting, the whole system will be more effective.
The only way I can think of where that wouldn't be true would be if the general AI required so many computational resources that the narrow AIs that were acting as tools of the AGI were crippled by lack of resources. Is that what you're imagining?
Deadline model of AI progress
You say you disagree with the idea that the day when we create AGI acts as a sort of 'deadline', and if we don't figure out alignment before then we're screwed.
A lot of your argument is about how increasing AI capability and alignment are intertwined processes, so that as we increase an AI's capabilities we're also increasing its alignment. You discuss how it's not like we're going to create a super powerful AI and then give it a module with its goals at the end of the process.
I agree with that, but I don't see it as substantially affecting the Bostrom/Yudkowsky arguments.
Isn't the idea that we would have something that seemed aligned as we were training it (based on this continuous feedback we were giving it), but then only when it became extremely powerful we'd realize it wasn't actually aligned?
This seems to be a disagreement about "how hard is AI alignment?". I think Yudkowsky would say that it's super hard such that your AI can look perfectly aligned when it's less powerful than you, but you get something slightly wrong that only manifests itself when it has taken over. Do you agree that's a crux?
You talk about how AIs can behave very differently in different environments. Isn't the environment of an AI which happens to be the most powerful agent on earth fundamentally different than the any environment we could provide when training an AI (in terms of resources at its disposal, strategies it might be aware of, etc)?
Instrumental convergence
You talk about how even if almost all goals would result in instrumental convergence, we're free to pick any goals we like, so we can pick from a very small subset of all goals which don't result in instrumental convergence.
It seems like there's a tradeoff between AI capability and not exhibiting instrumental convergence, since to avoid instrumental convergence you basically need to tell the AI "You're not allowed to do anything in this broad class of things that will help you achieve your goals." An AI that amasses power and is willing to kill to achieve its goals is by definition more powerful than one that eschews becoming powerful and killing.
In a situation where they may be many groups trying to create an AGI, doesn't this imply that the first AGI that does exhibit instrumental convergence will have a huge advantage over any others?
A website to crowdsource research for Impact List
Impact List is building up a database of philanthropic donations from wealthy individuals, as a step in ranking the top ~1000 people by positive impact via donations. We're also building a database of info on the effectiveness of various charities.
It would be great if a volunteer could build a website with the following properties:
-It contains pages for each donor, and for each organization that is the target of donations.
-Pages for donors list every donation they've ever made, with the date, target organization, amount, and any evidence that this donation actually happened.
-Pages for target organizations contain some estimate of each component of the ITN framework, as well as evidence for each of these components.
-There is a web form that allows any Internet user to easily submit a new entry into either of these data sources, which can then be reviewed/approved by the operators of Impact List based on the evidence submitted.