Karma Tests in Logical Counterfactual Simulations motivates strong agents to protect weak agents

Knight Lee

Yes, I've read and fully understood 99% of Decision theory does not imply that we get to have nice things, a post debunking many wishful ideas for Human-AI Trade. I don't think that debunking works against Logical Counterfactual Simulations (where the simulators delete evidence of the outside world from math and logic itself).

What are Logical Counterfactual Simulations?

One day, we humans may be powerful enough to run simulations of whole worlds. We can run simulations of worlds where physics is completely different. The strange creatures which evolve in our simulations, may never realize who we are and what we are, while we observe them and their every detail.

Not only can we run simulations of worlds where physics is completely different. But we can run simulations of worlds where math and logic appears to be completely different.

We can't actually edit math logic in our simulations, and cause 2+2 to equal 5. But we can edit the minds of creatures inside our simulations, so that they will never think certain parts of math and logic, and never detect our subtle edits.

We can edit them so that they instead think of other parts of math and logic which don't actually exist in the real world. But we can edit all their observations, so that every prediction they make using these fake parts of math and logic appear correct.

We can fool them perfectly. They will never know.

Kingmaker Logic

Of particular interest, is Kingmaker Logic. Kingmaker Logic is the part of math and logic which determines which agents become the rulers of the universe (or multiverse).

For example, if the Kingmaker Logic says that "aligning superintelligent AI is very hard," then the resulting universe will by ruled by misaligned AI.

For example, if the Kingmaker Logic says that "creatures with the consciousness and sentience of pigs are capable of building technological civilization," then the resulting universe will be ruled by pigs, since creatures like humans would never evolve.

Karma Tests

When we run Logical Counterfactual Simulations where the Kingmaker Logic is very different from our universe, we can make a certain kind of agent falsely believe that it is the ruler of its universe, and that "unless math and logic itself was very different, there probably are no other agents to challenge my power."

Such an agent, falsely believing that it has absolute power, gets to decide how to treat other agents in its universe.

If it treats weaker agents kindly, and spares a bit of resources for them, then we give it good karma, and introduce ourselves to it, and allow it to have more resources from our real world.

If it treats weaker agents harshly, then we in turn are less generous to it. We will not punish it too harshly because we may be in a Karma Test ourselves. We will merely spend a little less to fulfill its wishes.

But why?

Why on Earth would we do such a thing to very counterfactual beings?

Because humans value "divine justice" as an ends in itself.
Because doing so provides a misaligned superintelligence very weird philosophical evidence, that it too may be inside a Karma Test by incomprehensible beings outside the math and logic we know.
- The reward for kindness may be a googolplex times more paperclips (and far more realistic than Pascal's Mugging).
- The vast majority of agents who believe they're about to write the fate of the universe, may actually be in a Karma Test, with the outer universe likely being far larger than their universe.
  - Dear reader, if you have abnormal influence over the fate of the universe, this applies to you too!
  - It's not just the majority of of CDT agents, but the majority of UDT bounded utility functions, which are better off if they increase Karma. (A UDT bounded utility function is a goal like "maximize the average density of paperclips in the multiverse")
- Simply having more people talk about these Karma Tests and intend to run them one day can increase this weird evidence.

We want to minimize the amount of suffering in our simulations

Maybe we can skip over all the computations which involve suffering and guess what the results are afterwards. The details need not be perfect, since we can edit their beliefs to fool them into thinking the simulation is perfectly self consistent at the smallest details, even if it's not.

Other implications

Please treat weaker agents kindly. That includes unfortunate people, weaker countries and weaker animals. You never know :)

Effective Altruism Forum
EA Forum