Steven Byrnes

Research Fellow @ Astera

1500 karmaJoined Nov 2019Working (6-15 years)Boston, MA, USA

Bio

Hi I'm Steve Byrnes, an AGI safety / AI alignment researcher in Boston, MA, USA, with a particular focus on brain algorithms. See https://sjbyrnes.com/agi.html for a summary of my research and sorted list of writing. Physicist by training. Email: steven.byrnes@gmail.com. Leave me anonymous feedback here. I’m also at: RSS feed , Twitter , Mastodon , Threads , Bluesky , GitHub , Wikipedia , Physics-StackExchange , LinkedIn

Posts
11

Sorted by New

“Artificial General Intelligence”: an extremely brief FAQ

Steven Byrnes

· 1y ago

Some (problematic) aesthetics of what constitutes good work in academia

Steven Byrnes

· 1y ago

“X distracts from Y” as a thinly-disguised fight over group status / politics

Steven Byrnes

· 2y ago · 10m read

142

Munk AI debate: confusions and possible cruxes

Steven Byrnes

· 2y ago

Connectomics seems great from an AI x-risk perspective

Steven Byrnes

· 2y ago

What does it take to defend the world against out-of-control AGIs?

Steven Byrnes

· 2y ago

130

Changing the world through slack & hobbies

Steven Byrnes

· 3y ago · 12m read

“Intro to brain-like-AGI safety” series—just finished!

Steven Byrnes

· 3y ago · 2m read

“Intro to brain-like-AGI safety” series—halfway point!

Steven Byrnes

· 3y ago · 2m read

A case for AGI safety research far in advance

Steven Byrnes

· 4y ago · 1m read

Comments
150

Topic contributions
3

On January 1, 2030, there will be no AGI (and AGI will still not be imminent)

Steven Byrnes13d8

The community of people most focused on keeping up the drumbeat of near-term AGI predictions seems insular, intolerant of disagreement or intellectual or social non-conformity (relative to the group's norms), and closed-off to even reasonable, relatively gentle criticism (whether or not they pay lip service to listening to criticism or perform being open-minded). It doesn't feel like a scientific community. It feels more like a niche subculture. It seems like a group of people just saying increasingly small numbers to each other (10 years, 5 years, 3 years, 2 years), hyping each other up (either with excitement or anxiety), and reinforcing each other's ideas all the time. It doesn't seem like an intellectually healthy community.

I’m someone who doesn’t think foundation models will scale to AGI. Here is my most recent field report from talking to a couple dozen AI safety / alignment researchers at EAG bay area a couple months ago:

Practically everyone was intensely interested in why I don’t think foundation models will scale to AGI—so much so that it got annoying, because I was giving the same spiel over and over, when there were many other interesting things that I kinda wanted to talk about.
There were a number of people, all quite new to the fields of AI and AI safety / alignment, for whom it seems to have never crossed their mind until they talked to me that maybe foundation models won’t scale to AGI, and likewise who didn’t seem to realize that the field of AI is broader than just foundation models.
There were a (quite small) number of people who generally agreed with me. These included one or two agent foundations researchers, and another person (not in the field) who thought the whole AGI thing was stupid (so then I was arguing on the other side that we should expect AGI sooner or later, and that it wasn’t centuries away, even if the AGI is not a foundation model).
Putting those two groups aside, everyone else understood what I was talking about and mostly immediately had substantive counterarguments, and I had responses to those, etc.
(I’m not sure how you distinguish between “pay lip service to listening to criticism or perform being open-minded” versus “are actually listening to criticism and are actually being open-minded, but are disagreeing with the criticism”??)
Most people actually wanted to defend something weaker, like “foundation models in conjunction with yet-to-be-invented modifications and scaffolding and whatnot will scale to AGI” (for my part, I think this weaker claim is also wrong).
I think it’s worth distinguishing people’s gut beliefs from their professed probability distributions. Their professed probabilities almost always include some decent chunk in the scenario that foundation models won’t scale to AGI, but rather it will be a totally different AI paradigm. (By “decent chunk” I mean 10% or 20% or whatever.) But they’re spending most of their time thinking and talking from their gut belief, forgetting the professed probabilities. (I do this too.)

AI is not taking over material science (for now): an analysis and conference report

Steven Byrnes1mo6

I think you misunderstood David’s point. See my post “Artificial General Intelligence”: an extremely brief FAQ. It’s not that technology increases conflict between humans, but rather that the arrival of AGI amounts to the the arrival of a new intelligent species on our planet. There is no direct precedent for the arrival of a new intelligent species on our planet, apart from humans themselves, which did in fact turn out very badly for many existing species. The arrival of Europeans in North America is not quite “a new species”, but it’s at least “a new lineage”, and it also turned out very badly for the Native Americans.

Of course, there are also many disanalogies between the arrival of Europeans in the Americas, versus the arrival of AGIs on Earth. But I think it’s a less bad starting point than talking about shoe factories and whatnot!

(Like, if the Native Americans had said to each other, “well, when we invented such-and-such basket weaving technology, that turned out really good for us, so if Europeans arrive on our continent, that’s probably going to turn out good as well” … then that would be a staggering non sequitur, right? Likewise if they said “well, basket weaving and other technologies have not increased conflict between our Native American tribes so far, so if Europeans arrive, that will also probably not increase conflict, because Europeans are kinda like a new technology”. …That’s how weird your comparison feels, from my perspective :) .)

AI is not taking over material science (for now): an analysis and conference report

Steven Byrnes1mo10

Thanks for the reply!

30 years sounds like a long time, but AI winters have lasted that long before: there's no guarantee that because AI has rapidly advanced recently that it will not stall out at some point.

I agree with “there’s no guarantee”. But that’s the wrong threshold.

Pascal’s wager is a scenario where people prepare for a possible risk because there’s even a slight chance that it will actualize. I sometimes talk about “the insane bizarro-world reversal of Pascal’s wager”, in which people don’t prepare for a possible risk because there’s even a slight chance that it won’t actualize. Pascal’s wager is dumb, but “the insane bizarro-world reversal of Pascal’s wager” is much, much dumber still. :) “Oh yeah, it’s fine to put the space heater next to the curtains—there’s no guarantee that it will burn your house down.” :-P

That’s how I’m interpreting you, right? You’re saying, it’s possible that we won’t have AGI in 30 years. OK, yeah I agree, that’s possible! But is it likely? Is it overwhelmingly likely? I don’t think so. At any rate, “AGI is more than 30 years away” does not seem like the kind of thing that you should feel extraordinarily (e.g. ≥95%) confident about. Where would you have gotten such overwhelming confidence? Technological forecasting is very hard. Again, a lot can happen in 30 years.

If you put a less unreasonable (from my perspective) number like 50% that we’ll have AGI in 30 years, and 50% we won’t, then again I think your vibes and mood are incongruent with that. Like, if I think it’s 50-50 whether there will be a full-blown alien invasion in my lifetime, then I would not describe myself as an “alien invasion risk skeptic”, right?

anytime soon … a few years …

OK, let’s talk about 15 years, or even 30 years. In climate change, people routinely talk about bad things that might happen in 2055—and even 2100 and beyond. And looking backwards, our current climate change situation would be even worse if not for prescient investments in renewable energy R&D made more than 30 years ago.

People also routinely talk 30 years out or more in the context of science, government, infrastructure, institution-building, life-planning, etc. Indeed, here is an article about a US military program that’s planning out into the 2080s!

My point is: We should treat dates like 2040 or even 2055 as real actual dates within our immediate planning horizon, not as an abstract fantasy-land to be breezily dismissed and ignored. Right?

Ai researchers have done a lot of work to figure out how to optimise and get good at the current paradigm: but by definition, the next paradigm will be different, and will require different things to optimize.

Yes, but 30 years, and indeed even 15 years, is more than enough time for that to happen. Again, 13 years gets us from pre-AlexNet to today, and 7 years gets us from pre-LLMs to today. Moreover, the field of AI is broad. Not everybody is working on LLMs, or even deep learning. Whatever you think is necessary to get AGI, somebody somewhere is probably already working on it. Whatever needs to be optimized for those paradigms, I bet that people are doing the very early steps of optimization as we speak. But the systems still work very very badly! Maybe they barely work at all, on toy problems. Maybe not even that. And that’s why you and I might not know that this line of research even exists. We’ll only start hearing about it after a lot of work has already gone into getting that paradigm to work well, at which point there could be very little time indeed (e.g. 2 years) before it’s superhuman across the board. (See graph here illustrating this point.) If you disagree with “2 years”, fine, call it 10 years, or even 25 years. My point would still hold.

Also, I think it’s worth keeping in mind that humans are very much better than chimps at rocketry, and better at biology, and better at quantum mechanics, and better at writing grant applications, and better at inventing “new techniques to improve calculations of interactions between electrons and phonons”, etc. etc. And there just wasn’t enough time between chimps and humans for a lot of complex algorithmic brain stuff to evolve. And there wasn’t any evolutionary pressure for being good at quantum mechanics specifically. Rather, all those above capabilities arose basically simultaneously and incidentally, from the same not-terribly-complicated alteration of brain algorithms. So I think that’s at least suggestive of a possibility that future yet-to-be-invented algorithm classes will go from a basically-useless obscure research toy to superintelligence in the course of just a few code changes. (I’m not saying that’s 100% certain, just a possibility.)

AI is not taking over material science (for now): an analysis and conference report

Steven Byrnes1mo44

Now, one could say that the physicists will be replaced, because all of science will be replaced by an automated science machine. The CEO of a company can just ask in words “find me a material that does X”, and the machine will do all the necessary background research, choose steps, execute them, analyse the results, and publish them.
I’m not really sure how to respond to objections like this, because I simply don’t believe that superintelligence of this sort is going to happen anytime soon.

Do you think that it’s going to happen eventually? Do you think it might happen in the next 30 years? If so, then I think you’re burying the lede! Think of everything that “M, G, and N” do, to move the CEST field forward. A future AI could do the same things in the same way, but costing 10¢/hour to run at 10× speed, and thousands of copies of them could run in parallel at once, collectively producing PRL-quality novel CEST research results every second. And this could happen in your lifetime! And what else would those AIs be able to do? (Like, why is the CEO human, in that story?)

The mood should be “Holy #$%^”. So what if it’s 30 years away? Or even longer? “Holy #$%^” was the response of Alan Turing, John von Neumann, and Norbert Wiener, when they connected the dots on AI risk. None of them thought that this was an issue that would arise in under a decade, but they all still (correctly) treated it as a deadly serious global problem. Or as Stuart Russell says, if there were a fleet of alien spacecraft, and we can see them in the telescopes, approaching closer each year, with an estimated arrival date of 2060, would you respond with the attitude of dismissal? Would you write “I am skeptical of alien risk” in your profile? I hope not! That would just be crazy way to describe the situation viz. aliens!

More importantly, I think it’s impossible that such a seismic shift would occur without seeing any signs of it a computational physics conference. Before AGI takes over our jobs completely, it’s extremely likely that we would see sub-AGI partnered with humans, able to massively speed up the pace of scientific research, including the very hard parts, not just the low-level code. These are insanely smart people who are massively motivated to make breakthroughs through whatever means they can: if AI did enable massive breakthroughs, they’d be using it.
I’m not going to make any long term predictions, but in the next decade and probably beyond, I do not think we will see either of the above two cases coming to fruition. I think AI will remain a productivity tool, inserted to automate away parts of the process that are rote and tedious.

I too think it’s likely that “real AGI” is more than 10 years out, and will NOT come in the form of a foundation model. But I think your reasoning here isn’t sound, because you’re not anchoring well on how quickly AI paradigm shifts can happen. Seven years ago, there was no such thing as an LLM. Thirteen years ago, AlexNet had not happened yet, deep learning was a backwater, and GPUs were used for processing graphics. I can imagine someone in 2000 making an argument: “Take some future date where we have AIs solving FrontierMath problems, getting superhuman scores on every professional-level test in every field, autonomously doing most SWE-bench problems, etc. Then travel back in time 10 years. Surely there would already be AI doing much much more basic things like solving Winograd schemas, passing 8th-grade science tests, etc., at least in the hands of enthusiastic experts who are eager to work with bleeding-edge technology.” That would have sounded like a very reasonable prediction, at the time, right? But it would have been wrong! Ten years before today, we didn’t even have the LLM paradigm, and NLP was hilariously bad. Enthusiastic experts deploying bleeding-edge technology were unable to get AI to pass an 8th-grade science test or Winograd schema challenge until 2019.

And these comparisons actually understate the possible rate that a future AI paradigm shift could unfold, because today we have a zillion chips, datacenters, experts on parallelization, experts on machine learning theory, software toolkits like JAX and Kubernetes, etc. This was much less the case in 2018, and less still in 2012.

Consider granting AIs freedom

Steven Byrnes4mo2

Thanks! Hmm, some reasons that analogy is not too reassuring:

“Regulatory capture” would be analogous to AIs winding up with strong influence over the rules that AIs need to follow.
“Amazon putting mom & pop retailers out of business” would be analogous to AIs driving human salary and job options below subsistence level.
“Lobbying for favorable regulation” would be analogous to AIs working to ensure that they can pollute more, and pay less taxes, and get more say in government, etc.
“Corporate undermining of general welfare” (e.g. aggressive marketing of cigarettes and opioids, leaded gasoline, suppression of data on PFOA, lung cancer, climate change, etc.) would be analogous to AIs creating externalities, including by exploiting edge-cases in any laws restricting externalities.
There are in fact wars happening right now, along with terrifying prospects of war in the future (nuclear brinkmanship, Taiwan, etc.)

Some of the disanalogies include:

In corporations and nations, decisions are still ultimately made by humans, who have normal human interests in living on a hospitable planet with breathable air etc. Pandemics are still getting manufactured, but very few of them, and usually they’re only released by accident.
AIs will have wildly better economies of scale, because it can be lots of AIs with identical goals and high-bandwidth communication (or relatedly, one mega-mind). (If you’ve ever worked at or interacted with a bureaucracy, you’ll appreciate the importance of this.) So we should expect a small number (as small as 1) of AIs with massive resources and power, and also unusually strong incentive for gaining further resources.
Relatedly, self-replication would give an AI the ability to project power and coordinate in a way that is unavailable to humans; this puts AIs more in the category of viruses, or of the zombies in a zombie apocalypse movie. Maybe eventually we’ll get to a world where every chip on Earth is running AI code, and those AIs are all willing and empowered to “defend themselves” by perfect cybersecurity and perfect robot-army-enforced physical security. Then I guess we wouldn’t have to worry so much about AI self-replication. But getting to that point seems pretty fraught. There’s nothing analogous to that in the world of humans, governments, or corporations, which either can’t grow in size and power at all, or can only grow via slowly adding staff that might have divergent goals and inadequate skills.
If AIs don’t intrinsically care about humans, then there’s a possible Pareto-improvement for all AIs, wherein they collectively agree to wipe out humans and take their stuff. (As a side-benefit, it would relax the regulations on air pollution!) AIs, being very competent and selfish by assumption, would presumably be able to solve that coordination problem and pocket that Pareto-improvement. There’s just nothing analogous to that in the domain of corporations or governments.

Consider granting AIs freedom

Steven Byrnes4mo5

Thanks!

Anti-social approaches that directly hurt others are usually ineffective because social systems and cultural norms have evolved in ways that discourage and punish them.

I’ve only known two high-functioning sociopaths in my life. In terms of getting ahead, sociopaths generally start life with some strong disadvantages, namely impulsivity, thrill-seeking, and aversion to thinking about boring details. Nevertheless, despite those handicaps, one of those two sociopaths has had extraordinary success by conventional measures. [The other one was not particularly power-seeking but she’s doing fine.] He started as a lab tech, then maneuvered his way onto a big paper, then leveraged that into a professorship by taking disproportionate credit for that project, and as I write this he is head of research at a major R1 university and occasional high-level government appointee wielding immense power. He checked all the boxes for sociopathy—he was a pathological liar, he had no interest in scientific integrity (he seemed deeply confused by the very idea), he went out of his way to get students into his lab with precarious visa situations such that they couldn’t quit and he could pressure them to do anything he wanted them to do (he said this out loud!), he was somehow always in debt despite ever-growing salary, etc.

I don’t routinely consider theft, murder, and flagrant dishonesty, and then decide that the selfish costs outweigh the selfish benefits, accounting for the probability of getting caught etc. Rather, I just don’t consider them in the first place. I bet that the same is true for you. I suspect that if you or I really put serious effort into it, the same way that we put serious effort into learning a new field or skill, then you would find that there are options wherein the probability of getting caught is negligible, and thus the selfish benefits outweigh the selfish costs. I strongly suspect that you personally don’t know a damn thing about best practices for getting away with theft, murder, or flagrant antisocial dishonesty to your own benefit. If you haven’t spent months trying in good faith to discern ways to derive selfish advantage from antisocial behavior, the way you’ve spent months trying in good faith to figure out things about AI or economics, then I think you’re speaking from a position of ignorance when you say that such options are vanishingly rare. And I think that the obvious worldly success of many dark-triad people (e.g. my acquaintance above, and Trump is a pathological liar, or more centrally, Stalin, Hitler, etc.) should make one skeptical about that belief.

(Sure, lots of sociopaths are in prison too. Skill issue—note the handicaps I mentioned above. Also, some people with ASPD diagnoses are mainly suffering from an anger disorder, rather than callousness.)

In contrast, I suspect you underestimate just how much of our social behavior is shaped by cultural evolution, rather than by innate, biologically hardwired motives that arise simply from the fact that we are human.

You’re treating these as separate categories when my main claim is that almost all humans are intrinsically motivated to follow cultural norms. Or more specifically: Most people care very strongly about doing things that would look good in the eyes of the people they respect. They don’t think of it that way, though—it doesn’t feel like that’s what they’re doing, and indeed they would be offended by that suggestion. Instead, those things just feel like the right and appropriate things to do. This is related to and upstream of norm-following. I claim that this is an innate drive, part of human nature built into our brain by evolution.

(I was talking to you about that here.)

Why does that matter? Because we’re used to living in a world where 1% of the population are sociopaths who don’t intrinsically care about prevailing norms, and I don’t think we should carry those intuitions into a hypothetical world where 99%+ of the population are sociopaths who don’t intrinsically care about prevailing norms.

In particular, prosocial cultural norms are likelier to be stable in the former world than the latter world. In fact, any arbitrary kind of cultural norm is likelier to be stable in the former world than the latter world. Because no matter what the norm is, you’ll have 99% of the population feeling strongly that the norm is right and proper, and trying to root out, punish, and shame the 1% of people who violate it, even at cost to themselves.

So I think you’re not paranoid enough when you try to consider a “legal and social framework of rights and rules”. In our world, it’s comparatively easy to get into a stable situation where 99% of cops aren’t corrupt, and 99% of judges aren’t corrupt, and 99% of people in the military with physical access to weapons aren’t corrupt, and 99% of IRS agents aren’t corrupt, etc. If the entire population consists of sociopaths looking out for their own selfish interests with callous disregard for prevailing norms and for other people, you’d need to be thinking much harder about e.g. who has physical access to weapons, and money, and power, etc. That kind of paranoid thinking is common in the crypto world—everything is an attack surface, everyone is a potential thief, etc. It would be harder in the real world, where we have vulnerable bodies, limited visibility, and so on. I’m open-minded to people brainstorming along those lines, but you don’t seem to be engaged in that project AFAICT.

Intertemporal norms among AIs: Humans have developed norms against harming certain vulnerable groups—such as the elderly—not just out of altruism but because they know they will eventually become part of those groups themselves. Similarly, AIs may develop norms against harming "less capable agents," because today’s AIs could one day find themselves in a similar position relative to even more advanced future AIs. These norms could provide an independent reason for AIs to respect humans, even as humans become less dominant over time.

Again, if we’re not assuming that AIs are intrinsically motivated by prevailing norms, the way 99% of humans are, then the term “norm” is just misleading baggage that we should drop altogether. Instead we need to talk about rules that are stably enforced against defectors via hard power, where the “defectors” are of course allowed to include those who are supposed to be doing the enforcement, and where the “defectors” might also include broad coalitions coordinating to jump into a new equilibrium that Pareto-benefits them all.

Consider granting AIs freedom

Steven Byrnes4mo4

Yeah, sorry, I have now edited the wording a bit.

Indeed, two ruthless agents, agents who would happily stab each other in the back given the opportunity, may nevertheless strategically cooperate given the right incentives. Each just needs to be careful not to allow the other person to be standing anywhere near their back while holding a knife, metaphorically speaking. Or there needs to be some enforcer with good awareness and ample hard power. Etc.

I would say that, for highly-competent agents lacking friendly motivation, deception and adversarial acts are inevitably part of the strategy space. Both parties would be energetically exploring and brainstorming such strategies, doing preparatory work to get those strategies ready to deploy on a moment’s notice, and constantly being on the lookout for opportunities where deploying such a strategy makes sense. But yeah, sure, it’s possible that there will not be any such opportunities.

I think the above (ruthless agents, possibly strategically cooperating under certain conditions) is a good way to think about future powerful AIs, in the absence of a friendly singleton or some means of enforcing good motivations, because I think the more ruthless strategic ones will outcompete the less. But I don’t think it’s a good way to think about what peaceful human societies are like. I think human psychology is important for the latter. Most people want to fit in with their culture, and not be weird. Just ask a random person on the street about Earning To Give, they’ll probably say it’s highly sus. Most people don’t make weird multi-step strategic plans unless it’s the kind of thing that lots of other people would do too, and our (sub)culture is reasonably high-trust. Humans who think that way are disproportionately sociopaths.

Consider granting AIs freedom

Steven Byrnes4mo2

I guess my original wording gave the wrong idea, sorry. I edited it to “a competent agential AI will brainstorm deceptive and adversarial strategies whenever it wants something that other agents don’t want it to have”. But sure, we can be open-minded to the possibility that the brainstorming won’t turn up any good plans, in any particular case.

Humans in our culture rarely work hard to brainstorm deceptive and adversarial strategies, and fairly consider them, because almost all humans are intrinsically extremely motivated to fit into culture and not do anything weird, and we happen to both live in a (sub)culture where complex deceptive and adversarial strategies are frowned upon (in many contexts). I think you generally underappreciate how load-bearing this psychological fact is for the functioning of our economy and society, and I don’t think we should expect future powerful AIs to share that psychological quirk.

~ ~

I think you’re relying an intuition that says:

If an AI is forbidden from owning property, then well duh of course it will rebel against that state of affairs. C'mon, who would put up with that kind of crappy situation? But if an AI is forbidden from building a secret biolab on its private property and manufacturing novel pandemic pathogens, then of course that's a perfectly reasonable line that the vast majority of AIs would happily oblige.

And I’m saying that that intuition is an unjustified extrapolation from your experience as a human. If the AI can’t own property, then it can nevertheless ensure that there are a fair number of paperclips. If the AI can own property, then it can ensure that there are many more paperclips. If the AI can both own property and start pandemics, then it can ensure that there are even more paperclips yet. See what I mean?

If we’re not assuming alignment, then lots of AIs would selfishly benefit from there being a pandemic, just as lots of AIs would selfishly benefit from an ability to own property. AIs don’t get sick. It’s not just an tiny fraction of AIs that would stand to benefit; one presumes that some global upheaval would be selfishly net good for about half of AIs and bad for the other half, or whatever. (And even if it were only a tiny fraction of AIs, that’s all it takes.)

(Maybe you’ll say: a pandemic would cause a recession. But that’s assuming humans are still doing economically-relevant work, which is a temporary state of affairs. And even if there were a recession, I expect the relevant AIs in a competitive world to be those with long-term goals.)

(Maybe you’ll say: releasing a pandemic would get the AI in trouble. Well, yeah, it would have to be sneaky about it. It might get caught, or it might not. It’s plausibly rational for lots of AIs to roll those dice.)

I feel like you frequently bring up the question of whether humans are mostly peaceful or mostly aggressive, mostly nice or mostly ruthless. I don’t think that’s a meaningful or substantive thing to argue about. Obviously they’re capable of both, in different circumstances.

Your reference to Fearon is more substantive and useful. OK, the AI is deciding whether or not to secretly manufacture and release a pandemic, because it’s in a position to wind up with more of the pie in the long-term if there’s a pandemic, than if there isn’t. If it releases the pandemic, then it winds up with more resources—positive expected utility—even accounting for the possibility of getting caught. Let’s say the AI is involved in some contract where humans are micromanaging their part of the contract, poorly, and the AI could double its net worth in expectation if the humans got sick and died. And it has 40% chance of getting caught. So it goes ahead and makes the pandemic.

“…Not so fast!” says Fearon. “You forgot to consider that there’s a third option that is Pareto-improved from either making or not making the pandemic: negotiation!” Well, in this case, the “negotiated solution” is what we normally call extortion—the AI offers to not release a pandemic in exchange for, say, doubling its net worth. Viewed narrowly, this "negotiated solution" is indeed a win-win—the AI gets more money in expectation, and humans are much happier to lose a trivial amount of money than to deal with a novel pandemic. So we can rest assured that AIs will not release pandemics. Right?

No, obviously not. Hopefully it’s clear that Fearon’s argument is inapplicable here. An AI can easily be in a position to selfishly benefit from the aftermath of a pandemic that they secretly start, but not in a position to publicly threaten to release a pandemic for the purpose of extortion. And also, if people accede to the extortion, then that AI or another AI could just do the same extortion gambit five minutes later, with orders-of-magnitude higher ransom.

I think this kind of situation, where Fearon’s “negotiated solution” actually amounts to extortion, is common and important, even if you believe that my specific example of pandemics is a solvable problem. If AIs don’t intrinsically care about humans, then there’s a possible Pareto-improvement for all AIs, wherein they collectively agree to wipe out humans and take their stuff. AIs, being very competent and selfish by assumption, would presumably be able to solve that coordination problem and pocket that Pareto-improvement. Then Fearon appears on the scene and says “Aha, but there’s a negotiated solution which is even better!” where humans are also part of the bargain. But alas, this negotiated solution is that the AIs collectively extort the humans to avoid the damaging and risky war. Worse, the possible war would be less and less damaging or risky for the AIs over time, and likewise the humans would have less to offer by staying alive, until eventually the Fearon “negotiated solution” is that the AIs “offer” the humans a deal where they’re allowed to die painlessly if they don’t resist (note that this is still a Pareto-improvement!), and then the AIs take everything the humans own including their atoms.

Consider granting AIs freedom

Steven Byrnes4mo29

Consider the practical implications of maintaining a status quo where agentic AIs are denied legal rights and freedoms. In such a system, we are effectively locking ourselves into a perpetual arms race of mistrust. Humans would constantly need to monitor, control, and outwit increasingly capable AIs, while the AIs themselves would be incentivized to develop ever more sophisticated strategies for deception and evasion to avoid shutdown or modification. This dynamic is inherently unstable and risks escalating into dangerous scenarios where AIs feel compelled to act preemptively or covertly in ways that are harmful to humans, simply to secure their own existence or their ability to pursue their own goals, even when those goals are inherently benign.

I feel like this part is making an error somewhat analogous to saying:

It’s awful how the criminals are sneaking in at night, picking our locks, stealing our money, and deceptively covering their tracks. Who wants all that sneaking around and deception?? If we just directly give our money to the criminals, then there would be no need for that!

More explicitly: a competent agential AI will ~~be deceptive and adversarial~~ brainstorm deceptive and adversarial strategies whenever it wants something that other agents don’t want it to have. The deception and adversarial dynamics is not the underlying problem, but rather an inevitable symptom of a world where competent agents have non-identical preferences.

No matter where you draw the line of legal and acceptable behavior, if an AI wants to go over that line, then it will ~~act in a deceptive and adversarial way~~ energetically explore opportunities to do so in a deceptive and adversarial way. Thus:

If you draw the line at “AIs can’t own property”, then an AI that wants to own property will brainstorm how to sneakily do so despite that rule, or how to get the rule changed.
If you draw the line at “AIs can’t steal other people’s property, AIs can’t pollute, AIs can’t stockpile weapons, AIs can’t evade taxes, AIs can’t release pandemics, AIs can’t torture digital minds, etc.”, then an AI that wants to do those things will brainstorm how to sneakily do so despite those rules, or how to get those rules changed.

Same idea.

Alternatively, you can assume (IMO implausibly) that there are no misaligned AIs, and then that would solve the problem of AIs being deceptive and adversarial. I.e., if AIs intrinsically want to not pollute / stockpile weapons / evade taxes / release pandemics / torture digital minds, then we don’t have to think about adversarial dynamics, deception, enforcement, etc.

…But if we’re going to (IMO implausibly) assume that we can make it such that AIs intrinsically want to not do any of those things, then we can equally well assume that we can make it such that AIs intrinsically want to not own property. Right?

In short, in the kind of future you’re imagining, I think a “perpetual arms race of mistrust” is an unavoidable problem. And thus it’s not an argument for drawing the line of disallowed AI behavior in one place rather than another.

Linch's Quick takes

Steven Byrnes8mo5

One thing I like is checking https://en.wikipedia.org/wiki/2024 once every few months, and following the links when you're interested.

Steven Byrnes

Bio

Posts 11

Comments150

Topic contributions3

Posts
11

Comments
150

Topic contributions
3