I've just completed a comprehensive presentation of my current thinking on AI alignment. In it I give a formal statement of the AI alignment problem that is rigorous and makes clear the philosophical assumptions being made when we try to say we want to create AI aligned with human values. In the process I show the limitations of a decision theory based approach to AI alignment, similarly show the limitations of an axiological approach, and end up with a noematological approach (an account given in terms of noemata or the objects of conscious phenomena) that is better able to encompass all of what is necessarily meant by "align AI with human values" given we are likely to need to align non-rational agents.
I have yet to flesh out my thoughts on the prioritization consequences of this as it relates to what work we want to pursue to achieve AI alignment, but I have some initial thoughts I'd like to share and solicit feedback on. Without further ado, those thoughts which I believe deserve deeper discussion:
- Non-decision theory based research into AI alignment is under explored.
- Noematological research into AI alignment is almost completely ignored until now.
- There are a handful of examples where people consider how humans align themselves with their goals and use this to propose techniques for AI alignment, but they are preliminary.
- How non-decision theory based approaches could contribute to solving AI alignment was previously poorly understood, but now has a framework in which to work.
- This framework is poorly explored though so it's not yet clear exactly how to put insights from human and animal behavior and thought alignment into forms that might present specific solutions to AI alignment that could be tried.
- Existing decision theory based research, like MIRIs, is well funded and well attended to relative to non-decision theory based research.
- Non-decision theory based research into AI alignment is a neglected area ripe to benefit from additional funding and attention.
- Decision theory based research is in absolute terms still an area where more work can be done and so remains underfunded and underattended to relative to the amount it could carry.
- AI alignment research with different simplifying assumptions from those of decision theory are underexplored (compare 2 above).
- Noematological AI alignment research may be a "dangerous attractor".
- Draws ideas from disciplines that have less focus on rigor and so may attract distraction to the field from those who are unprepared for the level of care AI safety research demands.
- May give more of an angle for crackpots to access the field where a heavy decision theory focus helps weed them out.
Comments on these statements welcome here. If you have feedback on the original work I recommend you leave them as comments there or on the LW post about it.
Hi Gordon, I don't have accounts on LW or Medium so I'll comment on your original post here.
If possible, could you explain like I'm five what your working definition of the AI alignment problem is?
I find it hard to prioritize causes that I don't understand in simple terms.
I think the ELI5 on AI alignment is the same as it has been: make nice AI. Being a little more specific I like Russell's slightly more precise formulation of this as "align AI with human values", and being even more specific (without jumping to mathematical notation), I'd say we want to design AI that value what humans value and for us to believe these AI share our values.
Maybe the key thing I'm trying to get at though is that alignable AI will be phenomenally conscious, or in ELI5 terms as much people as anything else (humans, animals, etc.). So then my position is not just "make nice AI" but "make nice AI people we can believe are nice".
Thanks, Gordon.
"Make nice AI people we can believe are nice" makes sense to me; I hadn't been aware of the "...we can believe are nice" requirement.
I'm skeptical of your specific views on qualia, etc. (but I haven't read your arguments yet, so I withhold judgment.)
Despite that skepticism, this seems like a promising area to explore at least.
I agree with your #5.
For what it's worth I started out being very much in the analytic philosophy camp and thought qualia sounded like nonsense for a long time because much of the discussion of the idea avoids giving a precise description of what qualia are. But over time I switched sides, if you will, because I was forced into it by trying to parsimoniously explain reality with empiricist epistemology. For this reason I generally prefer to talk about noemata (a term I gave technical meaning to avoid confusion with existing ideas) rather than qualia for this reason: it avoids the way "qualia" has become associated with all kinds of confusion.