Curious explorer of interesting ideas.
I try to write as if I were having a conversation with you in person.
I like Meditation, AI Safety, Collective Intelligence, Nature, and Civilization VI.
I would like to claim that my current safety beliefs are a mix between Paul Christiano's, Andrew Critch's and Def/Acc
Currently the CSO of a startup ensuring safe collective systems of AIs into the real world. (Collective Intelligence Safety or Applied Cooperative AI, whatever you want to call it.)
I also think that Yuah Noah Harari has some of the best takes on the internet.
Thank you for that substantive response, I really appreciate it! It was also very nice that you mentioned the Turner et.al definitions, I wasn't expecting that.
(Maybe write a post on that? There's a comment that mentions uptake from major players in the EA ecosystem and maybe if you acknowledge you understand the arguments they would be more sympathetic? Just a quick thought but it might be worth engaging there a bit more?)
I just wanted to clarify some of the points I was trying to make yesterday as I do realise that they didn't all get across as I wanted them to.
I completely agree with you on the advancing progress point, I personally am quite against it from a "general"-level, I do not believe that we will be able to counterfactually change the "rowing" speed that much in the grand scheme of things. I also believe that is the conclusion of Toby's posts if I remember correctly. Toby was rather stating that existential risk reduction is worth a lot compared to any progress that we might be able to make. "Steering" away from the bad stuff is worth more. (That's the implicit claim from the modelling even though he's as epistemically humble as you philosophers always are (which is commendable!).)
Now for the power-seeking stuff. I appreciate your careful reasoning about these things and I see what you mean in that there's no threat model from that claim in itself. If we say that the classical way it is construed is something that is equivalent to minimizing free energy, this is a tautological statement and doesn't help for existential risk.
I think I can agree with you that we're not clear enough about the existential risk angle to have a clearly defined goal for what to do. I do think there's an argument there but that we have to be quite clear with how we're defining it for it to make foundational sense. A question that arises is if in the process of working on it we get more clarity about what it fundamentally is, similar to a startup figuring out what they're doing along the way? It might still be worth the resources from a unknown unknown perspective and institutional practices shifting perspective if that makes sense? TAI is such a big thing and it will only happen once so spending those resources on relatively shaky foundations might still make sense?
I'm, however, not sure that this is the case and Wei Dai for example has an entire agenda about "metaphilosophy" where the claim is that we're too philosophically confused to make sense of alignment. In general, I would agree that ensuring the philosophical and mathematical basis is very important to coordinate the field and it is something I've been thinking about for a while.
I personally am trying to import ideas from existing fields that deal with generally intelligent agents in biology and cognitive science such as Active Inference and Computational Biology into the mix to see how TAI will affect society. If we see smaller branches of science as specific offshoots of philosophy then I think the places with the most rigorous thinking on the foundations are the ones that have dealt with it for a long time. I've found a lot of interesting models about misalignment in these areas that I think can be transported into the AI Safety frame.
I really appreciate the deconstructive approach that you have to the intellectual foundations of the field. I do believe that there are alternatives to the classic risk story but you have to some extent break down the flaws in the existing arguments in order to advocate for new arguments.
Finally, where I think these threat models come from are arguments similar to the ones in What Failure Looks Like from Paul Christiano and the going out with a wimper idea. This is also explored in Yuval Noah Harari's books Nexus and Homo Deus. This threat model is more similar to the authoritian capture idea compared to something like a runaway intelligence explosion.
I'm looking forward to more work in this area from you!
Thank you for this post David!
I've from time to time engaged with my friends in discussion about your criticisms of longtermism and some existential risk calculations. I found that this summary post of your work and interaction calrifies my perspective on the general "inclination" that you have in engaging with the ideas, one that seems like a productive one!
Sometimes, I felt that it didn't engage with some of the core underlying claims of longtermism and exisential risk which did annoy me.
I want to respect the underlying time spend assymmetry of the following question as I feel I could make myself less ignorant if I had the time to spend which I feel I currently do not have. But what are your thoughts on Toby Ord's perspective and posts on existential risk?: https://forum.effectivealtruism.org/posts/XKeQbizpDP45CYcYc/on-the-value-of-advancing-progress
and:
https://forum.effectivealtruism.org/posts/hh7bgsDzP6rKZ5bbW/robust-longterm-comparisons
I felt that some of the arguments where about discount rates and that they didn't really make that much moral sense to me, neither did person-affecting views. I have hard time seeing the arguments for them and maybe that's just the crux of the matter.
The following will be unfair to say as I haven't spent the time required to fully understand your models but I sometimes feel that there are deeper underlying assumptions and questions that you pass by in your arguments.
I will be going to a domain I know well, AI Safety. For example, I agree with the power-seeking arguments not being fully true, especially not the early papers yet it doesn't engage with later follow up work such as:
https://arxiv.org/pdf/2303.16200
Finally I believe that for the power-seeking claim, there's a large amount of evidence for power-seeking within real world systems. For me it seems an overstep to reject power-seeking due to MIRI work?
You can redefine power-seeking itself as minimizing free energy which is in itself a theory of predictive processing or Active Inference and that has showed to have remarkable predictive capacity for saying useful things about systems that are alive. Yes a specific interpretation of power-seeking may not hold true but for me it is throwing the baby out with the bathwater.
I would love to hear your thoughts here and I'm looking forward to more good-faith discussions! (this is not sarcasm but I'm genuinely happy that you're engaging with good faith arguments!)
Edit: I do want to clarify that I do not believe that any AI system will converge towards instrumental goals and that it does make sense to question the foundations of the AI Safety assumptions and that I appluade you for doing so. It is rather a question of how much it will do so and under what conditions, in what systems it will do so. (I also made the language less combative)
I just did different combinations of the sleep supplements, you still get the confounder effects but it removes some of the cross-correlation. So Glycine 3 days, no magnesium followed by magnesium 3 days, no glycine e.t.c. It's not necessarily going to give you a high accuracy but you can see if it works or not and a rough effect size
I use bearable for 3 months at a time to get a picture of what is currently working. You can track effect sizes of supplements in sleep quality for example if you also have a way of tracking your sleep.
Funnily enough, I noticed there were a bunch of 80/20 stuff in my day through using bearable. I found doing a cold shower, loving kindness meditation in the morning and getting sunlight in the morning were like a difference of 30% in energy and enjoyment so I now do these religiously and it has worked wonders. (I really like bearable for these sorts of experiments.)
Sorry for not noticing the comment earlier!
Here's the Claude distillation based on my reasoning on why to use it:
Reclaim is useful because it lets you assign different priorities to tasks and meetings, automatically scheduling recurring meetings to fit your existing commitments while protecting time for important activities.
For example, you can set exercising three times per week as a priority 3 task, which will override priority 2 meetings, ensuring those exercise timeblocks can't be scheduled over. It also automatically books recurrent meetings so they fit into your existing schedule, like for team members or mentors/mentees.
This significantly reduces the time and effort spent on scheduling, as you can easily add new commitments without overlapping more important tasks. The main advantage is the ability to set varying priorities for different tasks, which streamlines the process of planning weekly and monthly calls, resulting in almost no overhead for meeting planning and making it simple to accommodate additional commitments without conflicting with higher-priority tasks..
Thanks Jacques! I was looking for an upgrade to some of my LLM tools. I was looking for some IDEs and I'll check that out.
The only tip I've got is using reclaim.ai instead of calendly for automatic meeting scheduling, it slaps.
Thanks! That post adresses what I was pointing at a lot better than I did in mine.
I can see from your response that I didn't get across my point as well as I wanted to but I appreciate the answer none the less!
It was more a question of what leads to the better long-term consequences rather than combining them.
It seems plausible animals have moral patienthood and so the scale of the problem is larger for animals whilst also having higher tractability. At the same time, you have cascading effects of economic development into better decision making. As a longtermist, this makes me very uncertain on where to focus resources. I will therefore put myself centrally to signal my high uncertainty.
I want to preface that I don't have a strong opinion here, just some curiosity and a question.
If we are focusing on second order effects wouldn't it make sense to bring up something like moral circle expansion and its relation to ethical and sustainable living over time as well?
From a long-term perspective, I see one of the major effects of global health being better decision making through moral circle expansion.
My question to you is then what time period you're optimising for? Does this matter for the argument?