This is a linkpost for https://futureoflife.org/wp-content/uploads/2023/04/FLI_Policymaking_In_The_Pause.pdf
this policy brief provides policymakers with concrete recommendations for how governments can manage AI risks.
Policy recommendations:
1. Mandate robust third-party auditing and certification.
2. Regulate access to computational power.
3. Establish capable AI agencies at the national level.
4. Establish liability for AI-caused harms.
5. Introduce measures to prevent and track AI model leaks.
6. Expand technical AI safety research funding.
7. Develop standards for identifying and managing AI-generated content and recommendations.
Shouldn't #6 be at the top? It seems to be one of the most critical factors for getting larger numbers of hardcore-quantitative people (e.g. polymaths, string theorists) to go into AI safety research. It will probably be one of those people, not anyone else, who makes a really big discovery.
There's already several orders of magnitude more hardcore-quant people dithering about trying to do pointless things like solving physics, and most of what we have so far is from the tiny fraction of the hardcore-quant people who ended up in AI safety research.
I think an extremely dangerous failure mode for AI safety research would be to prioritize 'hardcore-quantitative people' trying to solve AI alignment using clever technical tricks, without understanding much about the 8 billion ordinary humans they're trying to align AIs with.
If behavioral scientists aren't involved in AI alignment research -- and as skeptics about whether 'alignment' is even possible -- it's quite likely that whatever 'alignment solution' the hardcore-quantitative people invent is going to be brittle, inadequate, and risky.
I agree that behavioral science might be important to creating a non-brittle alignment, and I am very, very, very, very, very bullish about behavioral science being critically valuable for all sorts of factors related to AI alignment support, AI macrostrategy, and AI governance (including but not limited to neurofeedback). In fact, I think that behavioral science is currently the core crux deciding AI alignment outcomes, and that it will be the main factor determining whether or not enough quant people end up going into alignment. In fact, I think that the behavioral scientists will be remembered as the superstars, and the quant people are interchangeable.
However, the overwhelming impression I get with the current ML paradigm is that we're largely stuck with black box neural networks, and that these are extremely difficult and dangerous to align at all; they have a systematic tendency to generate insurmountably complex spaghetti code that is unaligned by default. I'm not an expert here, I specialized in a handful of critical elements in AI macrostrategy, but the things I've seen so far indicates that the neural network "spaghetti code" is much harder to work with than the human alignment elements. I'm not strongly attached to this view though.
Trevor -- yep, reasonable points. Alignment might be impossible, but it might be extra impossible with complex black box networks with hundreds of billions of parameters & very poor interpretability tools.