Thanks for your response. I'm unsure if we're importantly disagreeing. But here are some reactions.
I feel unsure about what you are saying, exactly [...] In the case that alignment goes well and there is a long reflection—i.e., (1) and (2) turn out true—my position is that doing AI welfare work now has no effect on the future, because all AI welfare stuff gets solved in the long reflection [...] In the case that alignment goes well but there is no long reflection—i.e., (1) turns out true but (2) turns out false
I read this as equating (1) with alignment goes well and (2) with there is a long reflection. These understandings of (1) and (2) are rather different from the original formulations of (1) and (2), which were what I had in mind when I responded to an objection by saying I didn't see how buying (1) and (2) undermined the point I was making. Crucially, I was understanding (2) as equivalent to the claim that if alignment goes well, then a long reflection probably happens by default. I don't know if we were on the same page about what (1) and (2) say, so I don't know if that clears things up. In case not, I'll offer a few more-substantive thoughts (while dropping further reference to (1) and (2) to avoid ambiguity).
I think digital minds takeoff going well (again, for digital minds and with respect to existential risk) makes it more likely that alignment goes well. So, granting (though I'm not convinced of this) that alignment going well and the long reflection are key for how well things go in expectation for digital minds, I think digital minds takeoff bears on that expectation via alignment. In taking alignment going well to be sensitive to how takeoff goes, I am denying that alignment going well is something we should treat as given independently of how takeoff goes. (I'm unsure if we disagree here.)
In a scenario where alignment does not go well, I think it's important that digital minds takeoff not have happened yet or, failing that, that digital minds' susceptibility to harm has been reduced before things go off the rails. In this scenario, I think it'd be good to have a portfolio of suffering and mistreatment risk-reducing measures that have been in place, including ones that nudge AIs away from having certain preferences and ones that disincentivize creating AIs with various other candidates for morally relevant features. I take such interventions to be within the purview of AI welfare as an area, partly because what counts as in the area is still up for grabs and such interventions seem natural to include and partly because such interventions are in line with stuff that people working in the area have been saying (e.g. a coauthor and I have suggested related risk-reducing interventions in Sec. 4.2.2 of our article on digital suffering and Sec. 6 of our draft on alignment and ethical treatment)--though I'd agree CLR folks have made related points and that associated suffering discourse feels outside the AI welfare field.
I'm also not sold on (2). But I don't see how buying (1) and (2) undermines the point I was making. If takeoff going well makes the far future go better in expectation for digital minds, it could do so via alignment or via non-default scenarios.
Re "I do wish people like yourself arguing for AI welfare as a cause area were clearer about whether they are making neartermist or longtermist arguments": that makes sense. Hopefully it's now clearer that I take considerations put forward in the post to be relevant under neartermist and longtermist assumptions. Perhaps worth adding: as someone already working in the area, I had material developed for other purposes that I thought worth putting forward for this debate week, per its (by my lights) discussing relevant considerations or illustrating the tractability of work in the area. But the material wasn't optimized for addressing whether AI welfare should be a cause area, and optimizing it for that didn't strike me as the most productive way for me to engage given my time constraints. (I wonder if something like this may apply more generally and help explain the pattern you observe.)
I'd be excited for AI welfare as an area to include a significant amount of explicitly longtermist work. Also, I find it plausible that heuristics like the one you mention will connect a lot of AI welfare work to the cosmic endowment. But I'm not convinced that it'd generally be a good idea to explicitly apply such heuristics in AI welfare work even for people who (unlike me) are fully convinced of longtermism. I expect a lot of work in this area to be valuable as building blocks that can be picked up from a variety of (longtermist and neartermist perspectives) and for such work's value to often not be enhanced by the work explicitly discussing how to build with it from different perspectives or how the authors would build on it given their perspective. I also worry that if AI welfare work were generally framed in longtermist terms whenever applicable (even when robust to longtermism vs. neartermism), that could severely limit the impact of the area.
Could you expand on what you regard as a key difference in our epistemic position with respect to animals vs. even in theory with respect to AI systems? Could this difference be put in terms of a claim you accept when applied to animals but not even in theory when applied to AI systems?
In connection with evaluating animal/AI consciousness, you mention behavior, history, incentives, purpose, and mechanism. Do you regard any of these factors as most directly relevant to consciousness? Are any of these only relevant as proxies for, say, mechanisms?
(My hunch is that more information on these points would make it easier for me or other readers to try to change your mind!)
Re “A crux here is that philosophy of mind doesn't really make much progress”: for what it’s worth, from the inside of the field, it feels to me like philosophy of mind makes a lot of progress, but (i) the signal-to-noise ratio in the field is bad, (ii) the field is large, sprawling, and uncoordinated, (iii) an impact-focused mindset is rare within the field, and (iv) only a small percentage of the effort in the field has been devoted to producing research that is directly relevant to AI welfare. This suggest to me that even if there isn’t a lot of relevant, discernible-from-the-outside progress in philosophy of mind, relevant progress may be fairly tractable.
If digital minds takeoff goes well (rather than badly) for digital minds and with respect to existential risk, would we expect a better far-future for digital minds? If so, then I'm inclined to think some considerations in the post are at least indirectly important to digital mind value stuff. If not, then I'm inclined to think digital mind value stuff we have a clue about how to positively affect is not in the far future.
(I like the question and examples!)
I take motivations for the biological requirement and for considering it to be empirical rather than a priori.
One motivation for the biological requirement is that, in the cases we know about, fine-grained differences in consciousness seem to be systematically and directly underpinned by biological differences. This makes the biological requirement more plausible than many other claims at the same level of specificity.
While there isn’t a corresponding motivation for the temperature and timescale claims, there are related motivations: at least in humans, operating in those ranges is presumably required for the states that are known to systematically and directly vary with fine-grained differences in consciousness; going towards either end of the 30-50 C temperature range also seems to render us unconscious, which suggests that going outside the range would do so as well.
Looking beyond the human case, I take it that certain animals operating outside the 30-50 C range makes the temperature claim less plausible than the biological requirement. Admittedly, if we widen the temperature range enough, the resulting temperature claim will be as plausible as the biological requirement. But the resulting temperature claim’s plausibility will presumably be inherited from claims (such as the biological requirement) that are more informative (hence more worthy of consideration) with respect to which systems are conscious.
As for the distance claim, perhaps it would be plausible if one had Aristotelian cosmological beliefs! But I take it we now have good reason to think that the physical conditions that can exist on Earth can also exist far beyond it and that fundamental laws don’t single out Earth or other particulars for special treatment. Even before considering correlational evidence regarding consciousness, this suggests that we should find it implausible that consciousness depends on having a substrate within a certain distance from Earth’s center. Correlational evidence reinforces that implausibility: local physical conditions are strongly predictive of known conscious differences independently of appeal to distance from Earth’s center, and we don’t know of any predictive gains to be had by appealing to distance from Earth’s center. Another reason to doubt the distance claim is that it suggests a remarkable coincidence: the one planet around which that candidate requirement can be met just so happens to be a planet around which various other requirements for consciousness happen to be met, even though the latter requirements are met around only a small percentage of planets.
Setting aside plausibility differences, one reason to consider the biological requirement in particular is that it rules out AI consciousness, whereas the temperature, timescale, and distance claims are compatible with AI consciousness (though they do have important-if-true implications concerning which AI systems could be conscious).
All that said, I’m sympathetic with thinking that there are other candidate barriers to AI consciousness that are as well-motivated as the biological requirement but neglected. My motivation in writing the draft was, given that biology has been and will continue to be brought to bear on the possibility of AI consciousness, it should be brought to bear via the biological requirement rather than via even more specific and less crucial theses about biology and consciousness that are often discussed.
Re how I see digital minds takeoff going well as aiding alignment: the main paths I see go through digital minds takeoff happening after we figure out alignment. That’s because I think aligning AIs that merit moral consideration without mistreating them adds an additional layer of difficulty to alignment. (My coauthor and I go into detail about this difficulty in the second paper I linked in my previous comment.) So if a digital minds takeoff happens while we're still figuring out alignment, I think we'll face tradeoffs between alignment and ethical treatment of digital minds, and that this bodes poorly for both alignment and digital minds takeoff.
To elaborate in broad strokes, even supposing that for longtermist reasons alignment going well dwarfs the importance of digital minds' welfare during takeoff, key actors may not agree. If digital minds takeoff is already underway, they may trade some probability of alignment going well for improved treatment of digital minds.
Upon noticing our willingness to trade safety for ethical treatment, critical-to-align AIs we’re trying to align may exploit that willingness e.g. by persuading their key actors that they (the AIs) merit more moral consideration; this could in turn make those systems less safe and/or lead to epistemic distortions about which AIs merit moral consideration.
This vulnerability could perhaps be avoided by resolving not to give consideration to AI systems until after we've figured out alignment. But if AIs merit moral consideration during the alignment process, this policy could result in AIs that are aligned to values which are heavily biased against digital minds. I would count that outcome as one way for alignment to not go well.
I think takeoff happening before we’ve figured out alignment would also risk putting more-ethical actors at a disadvantage in an AGI/ASI race: if takeoff has already happened, there will be an ethical treatment tax. As with a safety tax, paying the ethical treatment tax may lower the probability of winning while also correlating with alignment going well conditional on winning. There’s also the related issue of race dynamics: even if all actors are inclined toward ethical treatment of digital minds but think that it’s more crucial that they win, we should expect the winner to have cut corners with respect to ethical treatment if the systems they’re trying to align merit moral consideration.
In contrast, if a digital minds takeoff happens after alignment, I think we’d have a better shot at avoiding these tradeoffs and risks.
If a digital minds takeoff happens before alignment, I think it’d still tend to be better in expectation for alignment if the takeoff went well. If takeoff went poorly, I’d guess that’d be because we decided not to extend moral consideration to digital minds and/or because we’ve made important mistakes about the epistemology of digital mind welfare. I think those factors would make it more likely that we align AIs with values that are biased against digital minds or with importantly mistaken beliefs about digital minds. (I don’t think there’s any guarantee that these values and beliefs would be corrected later.)
Re uploading: while co-writing the digital suffering paper, I thought whole brain emulations (not necessarily uploads) might help with alignment. I’m now pessimistic about this, partly because whole brain emulation currently seems to me very unlikely to arrive before critical attempts at alignment, partly because I’m particularly pessimistic about whole brain emulations being developed in a morally acceptable manner, and partly because of the above concerns about a digital minds takeoff happening before we’ve figured out alignment. (But I don’t entirely discount the idea—I’d probably want to seriously revisit it in the event of another AI winter.)
This exchange has been helpful for me! It’s persuaded me to think I should consider doing a project on AI welfare under neartermist vs. longtermist assumptions.