Has there been any good, serious game-theoretic modeling of what 'AI alignment' would actually look like, given diverse & numerous AI systems interacting with billions human individuals and millions of human groups that have diverse, complex, & heterogenous values, preferences, and goals?
Are there any plausible models in which the AI systems, individuals, and groups can reach any kind of Pareto-efficient equilibrium?
Or any non-existence proof that such a Pareto-efficient equilibrium (i.e. true 'alignment') is impossible?
Michael - thanks very much for these links. I'll check them out!