OCB

Owen Cotton-Barratt

10192 karmaJoined

Sequences
3

Reflection as a strategic goal
On Wholesomeness
Everyday Longermism

Comments
923

Topic contributions
3

Which applications to focus on: I agree that epistemic tools and coordination-enabling tools will eventually have markets and so will get built at some point absent intervention. But this doesn't feel like a very strong argument -- the whole point is that we may care about accelerating applications even if it's not by a long period. And I don't think that these will obviously be among the most profitable applications people could make (especially if you can start specializing to the most high-leverage epistemic and coordination tools).

Also, we could make a similar argument that "automated safety" research won't get dropped, since it's so obviously in the interests of whoever's winning the race. 

UI and complementary technologies: I'm sort of confused about your claim about comparative advantage. Are you saying that there aren't people in this community whose comparative advantage might be designing UI? That would seem surprising.

More broadly, though:

  • I'm not sure how much "we can just outsource this" really cuts against the core of our argument (how to get something done is a question of tactics, and it could still be a strategic priority even if we just wanted to spend a lot of money on it)
  • I guess I feel, though, that you're saying this won't be a big bottleneck
    • I think that that may be true if you're considering automated alignment research in particular. But I'm not on board with that being the clear priority here

Compute allocation: mostly I think that "get people to care more" does count as the type of thing we were talking about. But I think that it's not just caring about safety, but also being aware ahead-of-time of the role that automated research may have to play in this, and when it may be appropriate to hit the gas and allocate a lot of compute to particular areas.

Training data: I agree that the stuff you're pointing to seems worthwhile. But I feel like you've latched onto a particular type of training data, and you're missing important categories, e.g.:

  • Epistemics stuff -- there are lots of super smart people earnestly trying to figure out very hard questions, and I think that if you could access their thinking, there would be a lot there which would compare favourably to a lot of the data that would be collected from people in this community. It wouldn't be so targeted in terms of the questions it addressed (e.g. "AI strategy", but learning good epistemics may be valuable and transfer over)
  • International negotiation, and high-stakes bargaining in general -- potentially very important, but not something I think our community has any particular advantage at
  • Synthetic data -- a bunch of things may be unlocked more by working out how to enable "self-play" (or the appropriate analogue), rather than just collecting more data the hard way

It seems like "what can we actually do to make the future better (if we have a future)?" is a question that keeps on coming up for people in the debate week.

I've thought about some things related to this, and thought it might be worth pulling some of those threads together (with apologies for leaving it kind of abstract). Roughly speaking, I think that:

 

There are some other activities which might help make the future better without doing so much to increase the chance of having a future, e.g.:

  • Try to propagate "good" values (I first wrote "enlightenment" instead of "good", since I think the truth-seeking element is especially important for ending up somewhere good; but others may differ), to make it more likely that they're well-represented in whatever entities end up steering
  • Work to anticipate and reduce the risk of worst-case futures (e.g. by cutting off the types of process that might lead there)

However, these activities don't (to me) seem as high leverage for improving the future as the more mixed-purpose activities.

Ughh ... baking judgements about what's morally valuable into the question somehow doesn't seem ideal. Like I think it's an OK way to go for moral ~realists, but among anti-realists you might have people persistently disagreeing about what counts as extinction.

Also like: what if you have a world which is like the one you describe as an extinction scenario, but there's a small amount of moral value in some subcomponent of that AI system. Does that mean it no longer counts as an extinction scenario?

I'd kind of propose instead using the typology Will proposed here, and making the debate between (1) + (4) on the one hand vs (2) + (3) on the other.

Fairly strong agree -- I'm personally higher on all of (2), (3), (4) than I am on (1).

The main complication is that I think among realistic activities we can pursue, often they won't correspond to a particular one of these; instead having beneficial effects on multiple. But I still think it's worth asking "which is it high priority to make plans targetting?", even if many of the best plans end up being those which aren't so narrow as to target one to the exclusion of the others.

This is right. But to add even more complication:

  • I think most AI x-risk (in expectation) doesn't lead to human extinction, but a noticeable fraction does
  • But a lot even of the fraction that leads to human extinction seems to me like it probably doesn't count as "extinction" by the standards of this question, since it still has the earth-originating intelligence which can go out and do stuff in the universe
    • However, I sort of expect people to naturally count this as "extinction"?

Since it wasn't cruxy for my rough overall position, I didn't resolve this last question before voting, although maybe it would get me to tweak my position a little.

Owen Cotton-Barratt
10
2
2
86% disagree

To some extent I reject the question as not-super-action-guiding (I think that a lot of work people do has impacts on both things).

But taking it at face value, I think that AI x-risk is almost all about increasing the value of futures where "we" survive (even if all the humans die), and deserves most attention. Literal extinction of earth-originating intelligence is mostly a risk from future war, which I do think deserves some real attention, but isn't the main priority right now.

IMO the betting odds framing gets things backwards. Bets are decisions, which are made rational by whether the beliefs they’re justified by are rational. I’m not sure what would justify the betting odds otherwise.

Not sure what I overall think of the better odds framing, but to speak in its defence: I think there's a sense in which decisions are more real than beliefs. (I originally wrote "decisions are real and beliefs are not", but they're both ultimately abstractions about what's going on with a bunch of matter organized into an agent-like system.) I can accept the idea of X as an agent making decisions, and ask what those decisions are and what drives them, without implicitly accepting the idea that X has beliefs. Then "X has beliefs" is kind of a useful model for predicting their behaviour in the decision situations. Or could be used (as you imply) to analyse the rationality of their decisions. 


I like your contrived variant of the pi case. But to play on it a bit:

  • Maybe when I first find out the information on Sally, I quickly eyeball and think that defensible credences probably lie within the range 30% to 90%
  • Then later when I sit down and think about it more carefully, I think that actually the defensible credences are more like in the range 40% to 75%
  • If I thought about it even longer, maybe I'd tighten my range a bit further again (45% to 55%? 50% to 70%? I don't know!)

In this picture, no realistic amount of thinking I'm going to do will bring it down to just a point estimate being defensible, and perhaps even the limit with infinite thinking time would have me maintain an interval of what seems defensible, so some fundamental indeterminacy may well remain.

But to my mind, this kind of behaviour where you can tighten your understanding by thinking more happens all of the time, and is a really important phenomenon to be able to track and think clearly about. So I really want language or formal frameworks which make it easy to track this kind of thing.

Moreover, after you grant this kind of behaviour [do you grant this kind of behaviour?], you may notice that from our epistemic position we can't even distinguish between:

  • Cases where we'd collapse our estimated range of defensible credences down to a very small range or even a single point with arbitrary thinking time, but where in practice progress is so slow that it's not viable
  • Cases where even in the limit with infinite thinking time, we would maintain a significant range of defensible credences

Because of this, from my perspective the question of whether credences are ultimately indeterminate is ... not so interesting? It's enough that in practice a lot of credences will be indeterminate, and that in many cases it may be useful to invest time thinking to shrink our uncertainty, but in many other cases it won't be.

Load more