I’ve been thinking about my annual donation, and I’ve decided to donate to MIRI this year. I haven’t previously donated to MIRI, and my reasons for doing so now are somewhat nuanced, so I thought they were worth explaining.
I have previously thought that MIRI was taking a somewhat less-than-ideal approach on AI safety, and they were not my preferred donation target. Three things have changed:
- My opinion of the approach has changed a little (actually not much);
- I think they are moving towards a better version of their approach (more emphasis on good explanations of their work);
- The background distribution of work and opportunities in AI safety has shifted significantly.
Overall I do not fully endorse MIRI’s work. I somewhat agree with the perspective of the Open Philanthropy Project’s review, although I am generally more positive towards MIRI’s work:
- I agree with those of their technical advisors who thought it could be beneficial for potential risks from advanced AI to solve the problems on the research agenda, rather than those who did not.
- I thought that the assessment of the level of progress was quite unfair.
- A key summary sentence, “One way of summarizing our impression of this conversation is that the total reviewed output is comparable to the output that might be expected of an intelligent but unsupervised graduate student over the course of 1-3 years.”, in particular, seems bemusingly unfair:
- I think I see significant importance in the work of giving technical framings of the problems in the first place. In some cases once this is done the solutions are not technically hard; I wonder if the OpenPhil review was concerned relatively more with the work on solutions.
- e.g. I am impressed by giving a theoretical foundation for classical game theory, and I can see various ways this could be useful. Note that this paper wasn’t included in the OpenPhil review (perhaps because one of the authors was one of the OpenPhil technical advisors).
- The point about supervision felt potentially a bit confused. I think it’s significantly easier to make quick progress when fleshing out the details of an established field than when trying to give a good grounding for a new field
- On the other hand, I do think the clarity of writing in some of MIRI’s outputs has not been great, and that this is potentially something supervision could have helped with. I think they’ve been improving on this.
- My reference class is PhD students in mathematics at Oxford, a group I’m familiar with. I find it plausible that this would line up with the output of some of the most talented such students, but I thought the wording implied comparison to a significantly lower bar than this.
- (Edit: see also this useful discussion with Jacob Steinhardt in the comment thread)
These views are based on several conversations with MIRI researchers over approximately the last three years, and reading a fraction of their published output.
Two or three years ago, I thought that it was important that AI safety engage significantly more with mainstream AI research, and build towards having an academic field which attracted the interest of many researchers. It seemed that MIRI’s work was quite far from optimised for doing that. I thought that the abstract work MIRI was doing might be important eventually, but that it was less time-critical than field-building.
Now, the work to build a field which ties into existing AI research is happening, and is scaling up quite quickly. Examples:
- Concrete Problems in AI Safety presents a research agenda which is accessible to a much broader community;
- The Future of Life Institute made a number of grants last year;
- The Open Philanthropy Project has given over $5 million to establish a Center for Human-Compatible AI at Berkeley;
- Google DeepMind and OpenAI are both building teams of safety researchers.
I expect this trend to continue for at least a year or two. Moreover I think this work is significantly talent-constrained (and capacity-constrained) rather than funding-constrained. In contrast, MIRI has been developing a talent pipeline and recently failed to reach its funding target, so marginal funds are likely to have a significant effect on actual work done over the coming year. I think that this funding consideration represents a significant-but-not-overwhelming point in favour of MIRI over other technical AI safety work (perhaps a factor of between 5 and 20 if considering allocating money compared to allocating labour, but I’m pretty uncertain about this number).
A few years ago, I was not convinced that MIRI’s research agenda was what would be needed to solve AI safety. Today, I remain not convinced. However, I’m not convinced by any agenda. I think we should pursuing a portfolio of different research agendas, focusing in each case on not optimising for technical results in the short term, but optimising for a solid foundation that we can build a field on and attract future talent to. As MIRI’s work looks to be occupying a much smaller slice of the total work going forwards than it has historically, adding resources to this part of the portfolio looks relatively more valuable than before. Moreover MIRI has become significantly better at clear communication of its agenda and work -- which I think is crucial for this objective of building a solid foundation -- and I know they are interested in continuing to improve on this dimension.
The combination of these factors, along with the traditional case for the importance of AI safety as a field, makes me believe that MIRI may well be the best marginal use of money today.
Ways I think this might be a mistake:
- Opportunity cost of money
- I’m fairly happy preferring funding MIRI to any other direct technical work in AI safety I know of.
- There might be other opportunities I am unaware of. For example I would like more people to work on Paul Christiano’s agenda. I don’t know a way to fund that directly (though I know some MIRI staff were looking at working on it a few months ago).
- It seems plausible that money could be better spent by 80,000 Hours or CFAR in helping to develop a broader pipeline of talent for the field. However, I think that a significant bottleneck is the development of really solid agendas, and I think MIRI may be well-placed to do this.
- Given the recent influx of money, another field than AI safety might be the best marginal use of resources. I personally think that prioritisation research is extremely important, and would consider donating to the Centre for Effective Altruism to support this instead of AI safety.
- I haven’t actually considered the direct comparison in this case. I want to stop donating to organisations I work very closely with, as I’m concerned about the potential conflict of interest.[*]
- Opportunity cost of researchers’ time
- Perhaps MIRI will employ researchers to work on a suboptimal agenda, and they would otherwise get jobs working on a more important part of AI safety (if those other parts are indeed talent constrained).
- However, I think that the background of MIRI researchers is often not the same as would be needed for work on (say) more machine-learning oriented research agendas.
- Failing to shift MIRI’s focus
- If MIRI were doing work that was useful but suboptimal, one might think that failure to reach funding targets could get them to re-evaluate. However:
- I think they are already shifting their focus in a direction I endorse.
- Withholding funding is a fairly non-cooperative way to try to achieve this. I’d prefer to give funding, and simply tell them my concerns.
Extra miscellaneous factors in favour of MIRI:
- I should have some epistemic humility
- I’ve had a number of conversations with MIRI researchers about the direction of their research, in moderate depth. I follow and agree with some of the things they are saying. In other cases, I don’t follow the full force of the intuitions driving their choices.
- The fact that they failed to explain it to me so that I could fully follow decreases my credence that what they have in mind is both natural and correct (relative to before they tried this), since I think it tends to be easier to find good explanations for natural and correct things.
- This would be a stronger update for me, except that I’ve also had the experience of people at MIRI repeatedly failing to convey something to me, and then succeeding over a year later. A clean case of this is that I previously believed decision theory was pretty irrelevant for AI safety, and I now see mechanisms for it to matter. This is good evidence that at least in some cases they have access to intuitions which are correct about something important, even when they’re unable to clearly communicate them.
- In these conversations I’ve also been able to assess their epistemics and general approach.
- I don’t fully endorse these, but they seem somewhat reasonable. I also think some of my differences arise from differences in communication style.
- Some general trust in their epistemics leads me to have some belief that there are genuinely useful insights that they are pursuing, even when they aren’t yet able to clearly communicate them.
- (Edit: see also this discussion with Anna Salamon in the comment thread.)
- Training and community building
- I think MIRI has a culture which encourages some useful perspectives on AI safety (I’m roughly pointing towards what they describe as “security mindset”).
- I’m less convinced than they that this mindset is particularly crucial, relative to, e.g. an engineering mindset, but I do think there is a risk of it being under-represented in a much larger AI safety community.
- I think that one of the more effective ways to encourage deep sharing of culture and perspective between research groups is exchange of staff.
- If MIRI has more staff in the short term, this will allow greater dispersal of this perspective in the next few years.
- Money for explicitly long-term work will tend to be neglected
- As AI systems become more powerful over the coming decades, there will be increasing short-term demand for AI safety work. I think that in many cases high-quality work producing robust solutions to short-term problems could be helpful for some of the longer-term problems. However there will be lots of short-term incentives to focus on short-term problems, or even long-term problems with short-term analogues. This means that altruistic money may have more leverage over the long-term scenarios.
- Note that this is particularly an argument about money. I think that there are important reasons to skew work towards scenarios where AI comes particularly soon, but I think it’s easier to get leverage over that as a researcher choosing what to work on (for instance doing short-term safety work with longer-term implications firmly in view) than as a funder.
Overall, I don’t think we understand the challenges to come well enough that we should commit to certain approaches yet. I think MIRI has some perspectives that I’d like to see explored and explained further, I think they’re moving in a good direction, and I’m excited to see what they’ll manage in the next couple of years.
Disclaimers: These represent my personal views, not those of my employers. Several MIRI staff are known personally to me.
[*] There are actually some tax advantages to my donating to CEA by requesting a lower salary. This previously swayed me to donate to CEA, but I think I actually care more about the possible bias. However, if someone who was planning to donate CEA wants to do a donation switch with me, we could recover and split these benefits, probably worth a few hundred dollars. Please message or email me if interested.
What in the grant write-up makes you think the focus was on number-of-papers-written? I was one of the reviewers and that was definitely not our process.
(Disclaimer: I'm a scientific advisor for OpenPhil, all opinions here are my own.)
Thanks for pointing that out! I've been conflating your comments with other conversations I've had with people about MIRI, and have removed my sentence. I just read through the OpenPhil report carefully again.
I think that I disagree with OpenPhil's stated conclusions, but due to having looked at different papers (I had forgotten that the 'unsupervised grad student' comment referred just to the three papers submitted, and I'd mis-remembered exactly which papers they were). After conversations with a few early-stage researchers in other fields, I think that ... (read more)