On the Russ Roberts ECONTALK Podcast #893, guest Tyler Cowen challenges Eliezer Yudkowsky and the Less Wrong/EA Alignment communities to develop a mathematical model for AI X-Risk.
Will Tyler Cowen agree that an 'actual mathematical model' for AI X-Risk has been developed by October 15, 2023?
https://manifold.markets/JoeBrenton/will-tyler-cowen-agree-that-an-actu?r=Sm9lQnJlbnRvbg
(This market resolves to "YES" if Tyler Cowen publicly acknowledges, by October 15 2023, that an actual mathematical model of AI X-Risk has been developed.)
Two excerpts from the conversation:
https://youtube.com/clip/Ugkxtf8ZD3FSvs8TAM2lhqlWvRh7xo7bISkp
...But, I mean, here would be my initial response to Eliezer. I've been inviting people who share his view simply to join the discourse. So, they have the sense, 'Oh, we've been writing up these concerns for 20 years and no one listens to us.' My view is quite different. I put out a call and asked a lot of people I know, well-informed people, 'Is there any actual mathematical model of this process of how the world is supposed to end?'
So, if you look, say, at COVID or climate change fears, in both cases, there are many models you can look at, including--and then models with data. I'm not saying you have to like those models. But the point is: there's something you look at and then you make up your mind whether or not you like those models; and then they're tested against data...
https://youtube.com/clip/Ugkx4msoNRn5ryBWhrIZS-oQml8NpStT_FEU
...So, when it comes to AGI and existential risk, it turns out as best I can ascertain, in the 20 years or so we've been talking about this seriously, there isn't a single model done. Period. Flat out.
So, I don't think any idea should be dismissed. I've just been inviting those individuals to actually join the discourse of science. 'Show us your models. Let us see their assumptions and let's talk about those.'...
Related:
Will there be a funding commitment of at least $1 billion in 2023 to a program for mitigating AI risk?
https://manifold.markets/JoeBrenton/will-there-be-a-funding-commitment?r=Sm9lQnJlbnRvbg
Will the US government launch an effort in 2023 to augment human intelligence biologically in response to AI risk?
https://manifold.markets/JoeBrenton/will-the-us-government-launch-an-ef?r=Sm9lQnJlbnRvbg
https://manifold.markets/JoeBrenton/will-the-general-public-in-the-unit?r=Sm9lQnJlbnRvbg
Will the general public in the United States become deeply concerned by LLM-facilitated scams by Aug 2 2023?
https://manifold.markets/JoeBrenton/will-the-general-public-in-the-unit?r=Sm9lQnJlbnRvbg
I built a preliminary model here: https://colab.research.google.com/drive/108YuOmrf18nQTOQksV30vch6HNPivvX3?authuser=2
It’s definitely too simple to treat as strong evidence, but it shows some interesting dynamics. For example, levels of alignment rise at first, then rapidly falling when AI deception skills exceed human oversight capacity. I sent it to Tyler and he agreed — cool, but not actual evidence.
If anyone wants to work on improving this, feel free to reach out!
While I don't have a very good opinion on AI risk research, this is the last necessary thing.
There is radical uncertainty about the technological paths opened by AI, wheather those paths end in AGI, and what kind of preferences would AGI have at the beguining and how they would evolve. Any mathematical modelling at this stage would be pure "pretense of knowledge". An exercise even more sterile than the numbers war about if there is 1%, a 10% or a 99% probability of AI doom.
It is time to explore the technology, and to make researchers sensitive to risks. In fact, I think that AI safety still does not exist as an independent knowledge field, and mathematization of (almost) nothing is even worse than nothing.
There is the Carlsmith model: https://arxiv.org/abs/2206.13353
It is not very complicated though, and it is conjunctive (which does not feel fitting for AI X-risk). I doubt that Tyler Cowen will like it.
Would love to identify and fund a well-regarded economist to develop AI risk models, if there were funding for it.