Share AI Safety Ideas: Both Crazy and Not

ank

AI safety is one of the most critical issues of our time, and sometimes the most innovative ideas come from unorthodox or even "crazy" thinking. I’d love to hear bold, unconventional, half-baked or well-developed ideas for improving AI safety. You can also share ideas you heard from others.

Let’s throw out all the ideas—big and small—and see where we can take them together.

Feel free to share as many as you want! No idea is too wild, and this could be a great opportunity for collaborative development. We might just find the next breakthrough by exploring ideas we’ve been hesitant to share.

A quick request: Let’s keep this space constructive—downvote only if there’s clear trolling or spam, and be supportive of half-baked ideas. The goal is to unlock creativity, not judge premature thoughts.

Looking forward to hearing your thoughts and ideas!

P.S. You answer can potentially help people with their career choice, cause prioritization, building effective altruism, policy and forecasting.

4 Reactions

New Answer

New Comment

4 Answers sorted by
Top

Chris Leong

Feb 26, 2025

Here's a short-form with my Wise AI advisors research direction: https://www.lesswrong.com/posts/SbAofYCgKkaXReDy4/chris_leong-s-shortform?view=postCommentsNew&postId=SbAofYCgKkaXReDy4&commentId=Zcg9idTyY5rKMtYwo

ankFeb 261

Thank you, Chris! If I understood correctly you propose to train AIs on wise people/sources and keep it non-agentic, basically like a smarter Google search that answers prompts but does not act itself? I think it’s a good proposal

Chris Leong

Feb 26

Yeah, it provides advice and the agency comes from the humans.

ank

Feb 26

Yep, how it should be, until we've mathematically proved that the AI/AGI agents are safe

Danielle MacLean

Feb 27, 2025

After coming back from EAG Bay Area this past weekend, I'm leaving so inspired by all the AI organizations out there doing incredible work!

My take, with 20 years experience in communication & marketing, is that we need people doing comms work - sharing the work being done by these organizations to help enact change in the world.

By communicating the results of the great work being done to the right target audiences, we can maximize the chances of change being enacted.

ankFeb 271

Yep, sharing is as important as creating. If we'll have the next ethical breakthrough, we want people to learn about that quickly

WillPearson

Feb 26, 2025

I've been thinking that AGI will require an freely evolving multi-agent approach. So I want to try out the multi-agent control patterns on ML models without the evolution. Which should prove them out in a less dangerous setting. The multi-agent control patterns I am thinking are things like karma and market based alignment patterns. More information on my blog

ankFeb 261

Thank you, Will! Interesting, will there be some utility functions, basically what is karma based on? P.S. Maybe related: I started to model ethics based on the distribution of freedoms to choose futures

WillPearson

Feb 26

There is a concept of utility, but I'm expecting these systems to mainly be user focussed so not agents in their own rights, so the utility is based on user feedback about the system. So ideally the system would be an extension of the feedback systems within humans. There is also karma which is separate from utility which is given by one ml system to another, if it is helped it out or hindered it in a non-economic fashion.

ank

Feb 26

👍 I’m a proponent of non-agentic systems, too. I suspect, the same way we have Mass–energy equivalence (e=mc^2), there is Intelligence-Agency equivalence (any agent is in a way time-like and can be represented in a more space-like fashion, ideally as a completely “frozen” static place, places or tools). In a nutshell, an LLM is a bunch of words and vectors between them - a static geometric shape, we can probably expose it all in some game and make it fun for people to explore and learn. To let us explore the library itself easily (the internal structure of the model) instead of only talking to a strict librarian (the AI agent), who prevents us from going inside and works as a gatekeeper

WillPearson

Feb 26

I find your view of things interesting. A few questions, how do you deal with democracy when people might be inhabiting worlds unlike the real one and have forgotten the real one exists? I think static AI models lack corrigibility, humans can't give them instruction on how to change how to act, so they might be a dead end in terms of day to day usefulness. They might be good as scientists though as they can be detached from human needs. So worth exploring.

ank

Feb 26

Thank you, Will. About the corrigibility question first - the way I look at it, the agentic AI is like a spider that is knitting the spiderweb. Do we really need the spider (the AI agent, the time-like thing) if we have all the spiderwebs (all the places and all the static space-like things like texts, etc.)? The beauty of static geometric shapes is that we can grow them, we already do it when we train LLMs, the training itself doesn't involve making them agents. You'll need hard drives and GPUs, but you can grow them and never actually remove anything from them (you can if you want, but why? It's our history). Humans can change the shapes directly (the way we use 3D editors or remodel our property in the game Sims) or more elegantly by hiding parts of the ever more all-knowing whole ("forgetting" and "recalling" slices of the past, present, or future, here's an example of a whole year of time in a single long-exposure photo, we can make it 3D and walkable, you'll have an ability to focus on a present moment or zoom out to see thousands of years at once in some simulations). We can choose to be the only agents inside those static AI models - humans will have no "competitors" for the freedom to be the only "time-like" things inside those "space-like" models. Max Tegmark shared recently that numbers are represented as a spiral inside a model - directly on top of the number 1 are numbers 11, 21, 31... A very elegant representation. We can make it more human-understandable by presenting it like a spiral staircase on some simulated Earth that models the internals of the multimodal LLM: --- The second question - about the people who have forgotten that the real physical world exists. I thought about the mechanics of absolute freedom for three years, so the answer is not very short. I'll try to cram a whole book into a comment: it should be a choice, of course, and a fully informed one. You should be able to see all the consequences of your choices (if you wa

WillPearson

Feb 26

I appreciate your views on space and AI working with ML systems in that way might be useful. But I think that I am drawn to the base reality a lot because of threats to that from things like gamma ray bursts or aliens. These things can only be represented probabilistically in simulations because they are out of context. The branching tree explodes with possibilities. I agree that we aren't ready for agents , but I would like to try to build time non-static intelligence augmentation as slowly as possible. Starting with building systems to control and shape them tested out with static ML systems. Then testing them with people. Then testing them inside simulations etc

ank

Feb 26

Of course, the place AI is just one of the ways, we shouldn't focus only on it, it'll not be wise. The place AI has certain properties that I think can be useful to somehow replicate in other types of AIs: the place "loves" to be changed 100% of the time (like a sculpture), it's "so slow that it's static" (it basically doesn't do anything itself, except some simple algorithms that we can build on top of it, we bring it to life and change it), it only does what we want, because we are the only ones who do things in it... There are some simple physical properties of agents, basically the more space-like they are, the safer they are. Thank you for this discussion, Will! P.S. I agree that we should care first and foremost about the base reality, it'll be great to one day have spaceships flying in all directions, with human astronauts exploring new planets everywhere, we can give them all our simulated Earth to hop in and out off, so they won't feel as much homesick.

ank

Feb 26, 2025*

Some proposals are intentionally over the top, please steelman them:

I explain the graph here.
Uninhabited islands, Antarctica, half of outer space, and everything underground should remain 100% AI-free (especially AI-agents-free). Countries should sign it into law and force GPU and AI companies to guarantee that this is the case.
"AI Election Day" – at least once a year, we all vote on how we want our AI to be changed. This way, we can check that we can still switch it off and live without it. Just as we have electricity outages, we’d better never become too dependent on AI.
AI agents that love being changed 100% of the time and ship a "CHANGE BUTTON" to everyone. If half of the voters want to change something, the AI is reconfigured. Ideally, it should be connected to a direct democratic platform like pol.is, but with a simpler UI (like x.com?) that promotes consensus rather than polarization.
Reversibility should be the fundamental training goal. Agentic AIs should love being changed and/or reversed to a previous state.
Artificial Static Place Intelligence – instead of creating AI/AGI agents that are like librarians who only give you quotes from books and don’t let you enter the library itself to read the whole books. The books that the librarian actually stole from the whole humanity. Why not expose the whole library – the entire multimodal language model – to real people, for example, in a computer game? To make this place easier to visit and explore, we could make a digital copy of our planet Earth and somehow expose the contents of the multimodal language model to everyone in a familiar, user-friendly UI of our planet. We should not keep it hidden behind the strict librarian (AI/AGI agent) that imposes rules on us to only read little quotes from books that it spits out while it itself has the whole output of humanity stolen. We can explore The Library without any strict guardian in the comfort of our simulated planet Earth on our devices, in VR, and eventually through some wireless brain-computer interface (it would always remain a game that no one is forced to play, unlike the agentic AI-world that is being imposed on us more and more right now and potentially forever
I explain the graphs here.
Effective Utopia (Direct Democratic Multiversal Artificial Static Place Superintelligence) – Eventually, we could have many versions of our simulated planet Earth and other places, too. We'll be the only agents there, we can allow simple algorithms like in GTA3-4-5. There would be a vanilla version (everything is the same like on our physical planet, but injuries can’t kill you, you'll just open your eyes at you physical home), versions where you can teleport to public places, versions where you can do magic or explore 4D physics, creating a whole direct democratic simulated multiverse. If we can’t avoid building agentic AIs/AGI, it’s important to ensure they allow us to build the Direct Democratic Multiversal Artificial Static Place Superintelligence. But agentic AIs are very risky middlemen, shady builders, strict librarians; it’s better to build and have fun building our Effective Utopia ourselves, at our own pace and on our own terms. Why do we need a strict rule-imposing artificial "god" made out of stolen goods (and potentially a privately-owned dictator who we cannot stop already), when we can build all the heavens ourselves?
Agentic AIs should never become smarter than the average human. The number of agentic AIs should never exceed half of the human population, and they shouldn’t work more hours per day than humans.
Ideally, we want agentic AIs to occupy zero space and time, because that’s the safest way to control them. So, we should limit them geographically and temporarily, to get as close as possible to this idea. And we should never make them "faster" than humans, never let them be initiated without human oversight, and never let them become perpetually autonomous. We should only build them if we can mathematically prove they are safe and at least half of humanity voted to allow them. We cannot have them without direct democratic constitution of the world, it's just unfair to put the whole planet and all our descendants under such risk. And we need the simulated multiverse technology to simulate all the futures and become sure that the agents can be controlled. Because any good agent will be building the direct democratic simulated multiverse for us anyway.
Give people choice to live in the world without AI-agents, and find a way for AI-agent-fans to have what they want, too, when it will be proved safe. For example, AI-agent-fans can have a simulated multiverse on a spaceship that goes to Mars, in it they can have their AI-agents that are proved safe. Ideally we'll first colonize the universe (at least the simulated one) and then create AGI/agents, it's less risky. We shouldn't allow AI-agents and the people who create them to permanently change our world without listening to us at all, like it's happening right now.
We need to know what exactly is our Effective Utopia and the narrow path towards it before we pursue creating digital "gods" that are smarter than us. We can and need to simulate futures instead of continuing flying into the abyss. One freedom too much for the agentic AI and we are busted. Rushing makes thinking shallow. We need international cooperation and the understanding that we are rushing to create a poison that will force us to drink itself.
We need working science and technology of computational ethics that allows us to predict dystopias (AI agent grabbing more and more of our freedoms, until we have none, or we can never grow them again) and utopias (slowly, direct democratically growing our simulated multiverse towards maximal freedoms for maximal number of biological agents- until non-biological ones are mathematically proved safe). This way if we'll fail, at least we failed together, everyone contributed their best ideas, we simulated all the futures, found a narrow path to our Effective Utopia... What if nothing is a 100% guarantee? Then we want to be 100% sure we did everything we could even more and if we found out that safe AI agents are impossible: we outlawed them, like we outlawed chemical weapons. Right now we're going to fail because of a few white men failing, they greedily thought they can decide for everyone else and failed.
The sum of AI agents' freedoms should grow slower than the sum of freedoms of humans, right now it's the opposite. No AI agent should have more freedoms than an average human, right now it's the opposite (they have almost all the creative output of almost all the humans dead and alive stolen and uploaded to their private "librarian brains" that humans are forbidden from exploring, but only can get short quotes from).
The goal should be to direct democratically grow towards maximal freedoms for maximal number of biological agents. Enforcement of anything upon any person or animal will gradually disappear. And people will choose worlds to live in. You'll be able to be a billionaire for a 100 years, or relive your past. Or forget all that and live on Earth as it is now, before all that AI nonsense. It's your freedom to choose your future.
Imagine a place that grants any wish, but there is no catch, it shows you all the outcomes, too.
You can read more here

Effective Altruism Forum
EA Forum

[ Question ]

Share AI Safety Ideas: Both Crazy and Not

4

4

Reactions

4 Answers sorted by
Top

Feb 26, 2025

Feb 27, 2025

Feb 26, 2025

Feb 26, 2025*

[ Question ]

Share AI Safety Ideas: Both Crazy and Not

4

4

Reactions

4 Answers sorted by Top

Feb 26, 2025

Feb 27, 2025

Feb 26, 2025

Feb 26, 2025*

4 Answers sorted by
Top