SA

Scott Alexander

3153 karmaJoined

Comments
49

Habryka referred me to https://forum.effectivealtruism.org/posts/A47EWTS6oBKLqxBpw/against-anthropic-shadow , whose "Possible Solution 2" is what I was thinking of. It looks like anthropic shadow holds if you think there are many planets (which seems true) and you are willing to accept weird things about reference classes (which seems like the price of admissions to anthropics). I appreciate the paper you linked for helping me distinguish between the claim that anthropic shadow is transparently true without weird assumptions, vs. the weaker claim in Possible Solution 2 that it might be true with about as much weirdness as all the other anthropic paradoxes.

I'm having trouble understanding this. The part that comes closest to making sense to me is this summary:

The fact that life has survived so long is evidence that the rate of
potentially omnicidal events is low...[this and the anthropic shadow effect] cancel out, so that overall the historical record provides evidence for a true rate close to the observed rate.

Are they just applying https://en.wikipedia.org/wiki/Self-indication_assumption_doomsday_argument_rebuttal to anthropic shadow without using any of the relevant terms, or is it something else I can't quite get?

Also, how would they respond to the fine-tuning argument? That is, it seems like most planets (let's say 99.9%) cannot support life (eg because they're too close to their sun). It seems fantastically surprising that we find ourselves on a planet that does support life, but anthropics provides an easy way out of this apparent coincidence. That is, anthropics tells us that we overestimate the frequency of things that allow us to be alive. This seems like reverse anthropic shadow, where anthropic shadow is underestimating the frequency of things that cause us to be dead. So is the paper claiming that anthropics does change our estimates of the frequency of good things, but can't change our estimate of the frequency of bad things? Why would this be?

I mostly agree with this. The counterargument I can come up with is that the best AI think tanks right now are asking for grants in the range of $2 - $5 million and seem to be pretty influential, so it's possible that a grantmaker who got $8 million could improve policy by 5%, in which case it's correct to equate those two. 

I'm not sure how that fits with the relative technical/policy questions.

Yes, I added them partway through after thinking about the question set more.

The article was obviously terrible, and I hope the listed mistakes get corrected, but I haven't seen a request for correction on the claim that CFAR/Lightcone has $5 million of FTX money and isn't giving it back. Is there any more information on whether this is true and, if so, what their reasoning is?

I think this is more over-learning and institutional scar tissue from FTX. The world isn't divided into Bad Actors and Non-Bad-Actors such that the Bad Actors are toxic and will destroy everything they touch.

There's increasing evidence that Sam Altman is a cut-throat businessman who engages in shady practices. This also describes, for example, Bill Gates and Elon Musk, both of whom also have other good qualities. I wouldn't trust either of them to single-handedly determine the fate of the world, but they both seem like people who can be worked with in the normal paradigm of different interests making deals with each other while appreciating a risk of backstabbing.

I think "Sam Altman does shady business practices, therefore all AI companies are bad actors and alignment is impossible" is a wild leap. We're still in the early (maybe early middle) stages of whatever is going to happen. I don't think this is the time to pick winners and put all eggs in a single strategy. Besides, what's the alternative? Policy? Do you think politicians aren't shady cut-throat bad actors? That the other activists we would have to work alongside aren't? Every strategy involves shifting semi-coalitions with shady cut-throat bad actors of some sort of another, you just try to do a good job navigating them and keep your own integrity intact.

If your point is "don't trust Sam Altman absolutely to pursue our interests above his own", point taken. But there are vast gulfs between "don't trust him absolutely" and "abandon all strategies that come into contact with him in any way". I think the middle ground here is to treat him approximately how I think most people here treat Elon Musk. He's a brilliant but cut-throat businessman who does lots of shady practices. He seems to genuinely have some kind of positive vision for the world, or want for PR reasons to seem like he has a positive vision for the world, or have a mental makeup incapable of distinguishing those two things. He's willing to throw the AI safety community the occasional bone when it doesn't interfere with business too much. We don't turn ourselves into the We Hate Elon Musk movement or avoid ever working with tech companies because they contain people like Elon Musk. We distance ourselves from him enough that his PR problems aren't our PR problems (already done in Sam's case; thanks to the board the average person probably thinks of us as weird anti-Sam-Altman fanatics) describe his positive and negative qualities honestly if asked, try to vaguely get him to take whatever good advice we have that doesn't conflict with his business too much, and continue having a diverse portfolio of strategies at any given time. Or, I mean, part of the shifting semi-coalitions is that if some great opportunity to get rid of him comes, we compare him to the alternatives and maybe take it. But we're so far away from having that alternative that pining after it is a distraction from the real world.

I thought we already agreed the demon case showed that FDT wins in real life, since FDT agents will consistently end up with more utility than other agents.

Eliezer's argument is that you can become the kind of entity that is programmed to do X, by choosing to do X. This is in some ways a claim about demons (they are good enough to predict even the choices you made with "your free will"). But it sounds like we're in fact positing that demons are that good - I don't know how to explain how they have 999,999/million success rate otherwise - so I think he is right.

I don't think the demon being wrong one in a million times changes much. 999,999 of the people created by the demon will be some kind of FDT decision theorist with great precommitment skills. If you're the one who isn't, you can observe that you're the demon's rare mistake and avoid cutting off your legs, but this just means you won the lottery - it's not a generally winning strategy.

Decision theories are intended as theories of what is rational for you to do.  So it describes what choices are wise and which choices are foolish. 

I don't understand why you think that the choices that get you more utility with no drawbacks are foolish, and the choices that cost you utility for no reason are wise.

On the Newcomb's Problem post, Eliezer explicitly said that he doesn't care why other people are doing decision theory, he would like to figure out a way to get more utility. Then he did that. I think if you disagree with his goal, you should be arguing "decision theory should be about looking good, not about getting utility" (so we can all laugh at you) rather than saying "Eliezer is confidently and egregiously wrong" and hiding the fact that one of your main arguments is that he said we should try to get utility instead of failing all the time and then came up with a strategy that successfully does that.

I think rather than say that Eliezer is wrong about decision theory, you should say that Eliezer's goal is to come up with a decision theory that helps him get utility, and your goal is something else, and you have both come up with very nice decision theories for achieving your goal.

(what is your goal?)

My opinion on your response to the demon question is "The demon would never create you in the first place, so who cares what you think?" That is, I think your formulation of the problem includes a paradox - we assume the demon is always right, but also, that you're in a perfect position to betray it and it can't stop you. What would actually happen is the demon would create a bunch of people with amputation fetishes, plus me and Eliezer who it knows wouldn't betray it, and it would never put you in the position of getting to make the choice in real life (as opposed to in an FDT algorithmic way) in the first place. The reason you find the demon example more compelling than the Newcomb example is that it starts by making an assumption that undermines the whole problem - that is, that the demon has failed its omniscience check and created you who are destined to betray it. If your problem setup contains an implicit contradiction, you can prove anything.

I don't think this is as degenerate a case as "a demon will torture everyone who believes FDT". If that were true, and I expected to encounter that demon, I would simply try not to believe FDT (insofar as I can voluntarily change my beliefs). While you can always be screwed over by weird demons, I think decision theory is about what to choose in cases where you have all of the available knowledge and also a choice in the matter, and I think the leg demon fits that situation.

Load more