Status: haven't spent long thinking about
I think it would be useful to have a rough idea of what experts think about the relative AI x-risk from misalignment (AI doesn't try do what we want), misuse (AI does what someone - say, a hostile or incompetent actor - wants), or incompetence (AI tries to do what we want, but fails - e.g. because it's unreliable or doesn't understand humans well), and also of how tractable work on each of these would be at reducing x-risk. These categories are from Paul Christiano's 'Current work in AI alignment' talk. Obviously, these are extremely difficult questions and we won't get highly robust estimates, but I think something is better than nothing here.
I think (but not with high confidence) that incompetence seems much less risky than misalignment or misuse:
- A sufficiently incompetent AI will be obviously not useful, so won't be used.
- AI capabilities research is related to competence, and is way less constrained and more financially incentivised than alignment or governance research.
- Experts like Christiano are less worried about incompetence, in Christiano's case because he thinks/hopes AI won't need a deep understanding of humans to not cause doom.
But I find it hard to know where to start when comparing misalignment and misuse. My first attempt was to find existing work. I found
- some detailed work on different potential sources of AI risk - but not comparing across misalignment and misuse scenarios,
- expert predictions on different sources of AI x-risk (top comment here); their Misuse category excudes cases of misuse involving war. I was also granted access to the private survey writeup, which gives more context. But, unless I've missed something, neither explains why the experts have the views they do.
- some experts' opinions about how well we seem to be doing at alignment / how hard alignment is, but nothing on misuse or comparing the tractability of the two.
It's quite possible I'm missing resources, so this is a call for resources, which could eventually be compiled into a document.
I'm not sure this is the most important set of resources to compile. But besides being of interest to me, I think it could be useful for anyone comparing working in AI governance or alignment, for whom personal fit considerations didn't dominate.
I think this mostly hasn't been done, but here's one survey that finds large disagreement.
Thanks. Yes, that was the survey I mentioned.