DS

Derek Shiller

Researcher @ Rethink Priorities
1510 karmaJoined Derekshiller.com

Posts
12

Sorted by New

Comments
118

We have heard from some organizations that have taken a close look at the CCM and it has spawned some back and forth about the takeaways. I don't think I can disclose anything specific further at this point, though perhaps we might be able to in the future.

Thanks for reporting this. You found an issue that occurred when we converted data from years to hours and somehow overlooked the place in the code where that was generated. It is fixed now. The intended range is half a minute to 37 minutes, with a mean of a little under 10. I'm not entirely sure where the exact numbers for that parameter come from, since Laura Duffy produced that part of the model and has moved on to another org, but I believe it is inspired by this report. As you point out, that is less than three hours of disabling equivalent pain. I'll have to dig deeper to figure out the rationale here.

After working on WIT, I’ve grown a lot more comfortable producing provisional answers to deep questions. In similar academic work, there are strong incentives to only try to answer questions in ways that are fully defensible: if there is some other way of going about it that gives a different result, you need to explain why your way is better. For giant nebulous questions, this means we will make very slow progress on finding a solution. Since these questions can be very important, it is better to come up with some imperfect answers rather than just working on simpler problems. WIT tries to tackle big important nebulous problems, and we have to sometimes make questionable assumptions to do so. The longer I’ve spent here, the more worthwhile our approach feels to me.

One of the big prioritization changes I’ve taken away from our tools is within longtermism. Playing around with our Cross-Cause Cost-Effectiveness Model, it was clear to me that so much of the expected value of the long-term future comes from the direction we expect it to take, rather than just whether it happens at all. If you can shift that direction a little bit, it makes a huge difference to overall value. I no longer think that extinction risk work is the best kind of intervention if you’re worried about the long-term future. I tend to think that AI (non-safety) policy work is more impactful in expectation, if we worked through all of the details.

Thanks for raising this point. We think that choosing the right decision theory that can handle imprecise probabilities is a complex issue that has not been adequately resolved. We take the point that Mogensen’s conclusions have radical implications for the EA community at large and we haven’t formulated a compelling story about where Mogensen goes wrong. However, we also believe that there are likely to be solutions that will most likely avoid those radical implications, and so we don’t need to bracket all of the cause prioritization work until we find them. Our tools may only be useful to those who see there to be work done on cause prioritization.

As a practical point, our Cross-Cause Cost-Effectiveness Model handles precise probabilities with Monte Carlo methods by randomly selecting individual values for parameters in each outcome from a distribution. We noted hesitance about enforcing a specific distribution over our range of radical uncertainty, but we stand behind this as a reasonable choice given our pragmatic aims. If the alternative is not to try to calculate relative expected values, we think that would be a loss, even if our own results have methodological doubts still attached to them.

We appreciate your perspective; it provides us with a chance to clarify our goals. The case you refer to was intended as an example of the ways in which normative uncertainty matters and we did not mean for the views there accurately model real-world moral dilemmas or the span of reasonable responses to them.

However, you might also object that we don’t really make it possible to incorporate the intrinsic valuing of natural environments in our moral parliament tool. Some might see this as an oversight. Others might be concerned about other missing subjects of human concern: respect for God, proper veneration of our ancestors, aesthetic value, etc. We didn’t design the tool to encompass the full range of human values, but to reflect the major components of the values of the EA community (which is predominantly consequentialist and utilitarian). It is beyond the scope of this project to assess whether those values should be exhaustive. That said, we don’t think strict attachment to the values in the tool are necessary for deriving insights from it, and we think it models approaches to normative uncertainty well even if it doesn’t capture the full range of the subjects of human normative uncertainty.

For an intervention to be a longtermist priority, there needs to be some kind of concrete story for how it improves the long-term future.

I disagree with this. With existential risk from unaligned AI, I don't think anyone has ever told a very clear story about how AI will actually get misaligned, get loose, and kill everyone. People have speculated about components of the story, but generally not in a super concrete way, and it isn't clear how standard AI safety research would address a very specific disaster scenario. I don't think this is a problem: we shouldn't expect to know all the details of how things go wrong in advance, and it is worthwhile to do a lot of preparatory research that might be helpful so that we're not fumbling through basic things during a critical period. I think the same applies to digital minds.

Your points here do not engage with the argument, made by @Zach Stein-Perlman early on in the week, that we can just punt solving AI welfare to the future (i.e., to the long reflection / to once we have aligned superintelligent advisors), and in the meantime continue focusing our resources on AI safety (i.e., on raising the probability that we make it to a long reflection).

I think this viewpoint is overly optimistic about the probability of locking in / the relevance of superintelligent advisors. I discuss some of the issues of locking in in a contribution to the debate week. In brief, I think that it is possible that digital minds will be sufficiently integrated in the next few decades that they will have power in social relationships that will be extremely difficult to disentangle. I also think that AGI may be useful in drawing inferences from our assumptions, but won't be particularly helpful at setting the right assumptions.

I generally agree that the formal thesis for the debate week set a high bar that is difficult to defend and I think that this is a good statement of the case for that. Even if you think that AI welfare is important (which I do!), the field doesn't have the existing talent pipelines or clear strategy to absorb $50 million in new funding each year. Putting that much in over the next few years could easily make things worse. It is also possible that AI welfare has the potential for non-EA money and it should aim for that rather than try to take money that would otherwise go to EA cause areas.

That said, there are other points that I disagree with:

It is not good enough to simply say that an issue might have a large scale impact and therefore think it should be an EA priority, it is not good enough to simply defer to Carl Shulman's views if you yourself can't argue why you think it's "pretty likely... that there will be vast numbers of AIs that are smarter than us" and why those AIs deserve moral consideration.

I think that this is wrong. The fact that something might have a huge scale and we might be able to do something about it is enough for it to be taken seriously and provides prima facie evidence that it should be a priority. I think it is vastly preferrable to preempt problems before they occur rather than try to fix them once they have. For one, AI welfare is a very complicated topic that will take years or decades to sort out. AI persons (or things that look like AI persons) could easily be here in the next decade. If we don't start thinking about it soon, then we may be years behind when it happens.

AI people (of some form or other) are not exactly a purely hypothetical technology, and the epistemic case for them doesn't seem fundamentally different from the case for thinking that AI safety will be an existential issue in the future, that the average intensively farmed animal leads a net-negative life, or that any given global health intervention won't have significant unanticipated negative side effects. We're dealing with deep uncertainties no matter what we do.

Additionally, it might be much harder to try to lobby for changes once things have gone wrong. I wish some groups were actively lobbying against intensified animal agriculture in the 1930s (or the 1880s). It may not have been tractable. It may not have been clear, but it may have been possible to outlaw some terrible practices before they were adopted. We might have that opportunity now with AI welfare. Perhaps this means that we only need a small core group, but I do think some people should make it a priority.

I stick by my intuition, but it is really just an intuition about how human behavior. Perhaps some people would be completely unbothered in that situation. Perhaps most would. (I actually find that itself worrisome in a different way, because it suggests that people may easily overlook AI wellbeing. Perhaps you have the right reasons for happily ignoring their anguished cries, but not everyone will.) This is an empirical question, really, and I don’t think we’ll know how people will react until it happens.

How could they not be conscious?

It is rare for theories of consciousness to make any demands on motivational structure.

  • Global workspace theory, for instance, says that consciousness depends on having a central repository by which different cognitive modules talk to each other. If the modules were to directly communicate point to point, there would be no conscious experiences (by that theory). I see no reason in that case why decision making would have to rely on different mechanisms.
  • Higher order theories suggest that consciousness depends on having representations of our own mental states. A creature could have all sorts of direct concerns that it never reflected on, and these could look a lot like ours.
  • IIT suggests that you could have a high level duplicate of a conscious system that was unconscious due to the fine grained details.
  • Etc. 

The specific things you need to change in the robots to render them not conscious depends on your theory, but I don’t think you need to go quite so far as to make them a lookup table or an transformer.


My impression was that you like theories that stress the mechanisms behind our judgments of the weirdness of consciousness as critical to conscious experiences. I could imagine a robot just like us but totally non-introspective, lacking phenomenal concepts, etc. Would you think such a thing was conscious? Could it not desire things in something like the way we do?

There's another question about whether I'd actually dissect one, and maybe I still wouldn't, but this could be for indirect or emotional reasons. It could still be very unpleasant or even traumatic for me to dissect something that cries out and against the desperate pleas of its mother. Or, it could be bad to become less sensitive to such responses, when such responses often are good indicators of risk of morally significant harm. People who were confident nonhuman animals don't matter in themselves sometimes condemned animal cruelty for similar reasons.

This supports my main argument. If you value conscious experience these emotional reasons could be concerning for the long term future. It seems like a slippery slope from being nice to them because we find it more pleasant to thinking that they are moral patients, particularly if we frequently interact with them. It is possible that our generation will never stop caring about consciousness, but if we’re not careful, our children might.

Load more