Open-source LLMs may prove Bostrom's vulnerable world hypothesis

Roope Ahvenharju

In short, Nick Bostrom's vulnerable world hypothesis states that humanity may in the future invent a technology that is potentially very destructive and very cheap to make and easily implementable. As a thought experiment, he uses something called "easier nukes" that would not have to be made from plutonium as in real life but could instead be made from battery and two pieces of glass. Bostrom says that if everyone could make nuclear weapons in their own home, civilization would be destroyed by default because terrorists, malcontent people and "folk who just want to see what would happen" would blow up most cities.

I can see similarities between Bostrom's hypothesis, and the way powerful LLM-models have recently been open sourced. It did not take long after the publication of ChatGPT and GPT-4 for several "open-source alternatives" to appear on the internet. Some of those you can even download to your own computer for offline use. And then we did have ChaosGPT ordering such a model to "destroy humanity". In my mind, he was one of the "folks who just wanted to see what would happen".

Recently I have been thinking that, in the long term, the biggest challenge in AI safety is the potential wide availability of the future AGI-systems. If we can create a safely aligned AGI, what would prevent some people from creating open-source alternative AGIs that are free from safety constraints and more powerful because of that? Advanced technology can't generally stay secret forever. And then we would have many future ChaosGPTs "who just want to see what would happen" and who could tell such a system to destroy humanity.

Old analogy used in AI safety communities about making a wish to genie and "getting what you asked but not what you wanted" would no longer be relevant in this scenario. Instead, the potential future ChaosGPTs would get exactly what they asked and perhaps even what they truly wanted.

Does the community here also think that this is a reasonable concern? I would like to know that in the comments and maybe start a discussion about the future priorities in AI safety. Because if, in the far future, practically everyone could be able to use open-sourced AGI-system and order it to destroy humanity, it would probably not be possible to completely prevent such malicious applications. Instead, the focus should be about increasing the societal resilience against deliberate AI attacks and takeover attempts. Perhaps that could be attempted through increased cybersecurity and trying to find ways to counter possible destructive technologies employed by AGI such as nanotechnology and genetically engineered pandemics. Aligned AGI, which hopefully would be created before the unaligned ones, would certainly help.

14 Reactions

Comments2

Sorted by

New & upvoted

Click to highlight new comments since: Today at 7:44 PM

kpurensApr 14 20231

>Bostrom says that if everyone could make nuclear weapons in their own home, civilization would be destroyed by default because terrorists, malcontent people and "folk who just want to see what would happen" would blow up most cities.

Yes, and what would the world be like to change this?

It's terrible to think that the reason we are safe is because others are powerless. If EA seeks to maximize human potential, I think it's really insightful that we are confident many people would destroy just because they can. And I think focusing on real well being of people is a way we can confront this.

Let's do the thought experiment: What would the world look like where anyone had power to destroy the world at any time--and choose not to? Where no one made that choice?

What kind of care and support systems would exist? How would we respect each other? How would society be organized?

I think this is a good line of thinking because it helps understand how the world is vulnerable, and how we can make it less so.

-Kristopher

dan.pandoriApr 14 20236

It feels weird to me to hear that something is terrible to think. It might be terrible that we're only alive because everyone doesn't have the option to kill everyone else instantly, but it's also true. Thinking true thoughts isn't terrible.

If everyone has a button that could destroy all life on the planet, I feel like it's unrealistic to expect that button to remain unpressed for more than a few hours. The most misanthropic person on Earth is very, very misanthropic. I'm not confident that many people would press the button, but the whole thing is that it only takes one.

Given that currently people don't have such a button, it seems easier to think how we can prevent that button from existing, rather than how we could make everyone agree not to press the button. The button is a power no one should have.