Epistemic Status:
I sustained multiple semi-serious debilitating injuries while in the process of writing this piece for the “Automation of Wisdom and Philosophy” essay contest and so was not able to get feedback and finish editing before publishing in time for the contest. Therefore, any feedback is highly appreciated, thanks!
TL;DR
In this first post in a series on Artificial Wisdom I introduce the term “Artificial Wisdom (AW),” which refers to artificial intelligence systems which substantially increase wisdom in the world. Wisdom may be defined as "thinking/planning which is good at avoiding large-scale errors," including both errors of commission and errors of omission; or as “having good goals” including terminal goals and sub-goals.
Due to orthogonality, it is possible we could keep AI under control and yet use it very unwisely. Four scenarios are discussed on how AI alignment interacts with artificial wisdom, with artificial wisdom being an improvement on any world, unless pursuit of AW significantly detracts from alignment, causing it to fail.
By “strapping” wisdom to AI via AW as AI takes off, we may be able to generate enormous quantities of wisdom in both humans and autonomous AI systems which could help us navigate Transformative AI and "The Most Important Century" wisely, in order to achieve existential security and navigate toward a positive long-term future.
The Genie
A genie pops out of a magic lamp and grants you three wishes.
What is your first wish?
One possibility is to wish for a million more wishes, which will each be used to wish for another million more powerful wishes, which will in turn each be used to wish for another million even more powerful wishes, etc. Maybe as an afterthought you set aside a few wishes to wish that all of the other wishes don’t do too much accidental harm.
Another option is to wish that your next two wishes do what you actually want them to do without any major negative side effects. This one seems like a pretty good idea.
A third option is to wish to know “What are the best possible next two wishes I could wish for?” (“best” in some objective sense, or “best” from your own perspective and what you would decide is best after having observed all possible consequences of all possible wishes.)
This last wish was close to the wish of King Solomon in the Bible, and even more impressively, God had only offered him one wish… What did he wish for?
Wisdom.
Intelligence, Wisdom, Orthogonality, and Two-Step Alignment
It is sometimes explicated in the AI safety community that high ability to achieve goals and having good goals may be highly orthogonal. That is to say, any amount of intelligence can, in principle, be used to pursue any conceivable goal, no matter how good or bad the goal. “Intelligence” is sometimes defined as ability to achieve goals, contrasted with what we might call “Wisdom,” which is having good goals.
In this vein, it is often understood that there are, in a sense, two steps to AI alignment:
- Figure out how to get arbitrarily intelligent AI to reliably do what we want it to do without any catastrophic side effects
- Figure out what values or goals we should actually give the AI, perhaps via something like a “long reflection”
Interestingly, I get the sense that generally it is thought that it is best to wait until after we have achieved alignment of artificial superintelligence (ASI) to try to solve the values problem, perhaps because of the overwhelming urgency of alignment work and because we have been attempting to solve the thorny ethical and moral dilemmas of what is ultimately valuable and good for thousands of years, and it seems foolhardy to believe we will solve this in the next few years before achieving ASI.
I think what this argument misses is that the equation is changing quite drastically, as we are gaining immense cognitive power to do the work of wisdom, and that by doing this work we may aid alignment and related work and improve the situation into which artificial superintelligence is born.
Defining Wisdom and Artificial Wisdom
Wisdom
So then what is meant by wisdom? The “Automation of Wisdom and Philosophy” essay contest defines wisdom as “thinking/planning which is good at avoiding large-scale errors.” I think you could bring this closer to my own definition of “having good goals” by sub-dividing my definition into “having good terminal goals” and “having good subgoals;” and understanding “errors” to mean both errors of commission and errors of omission.
There are many ways to make large-scale errors, but if you have both good terminal goals and good subgoals, this is essentially another way of saying that you have avoided all errors of omission and commission in both your means and ends. I will illustrate with some examples of what good terminal goals and good subgoals might mean.
Terminal Goals
A good terminal goal:
- Includes everything that you want
- Excludes things you don’t want
- Is achievable, without taking excessive risk
- In an ultimate sense, might be something like utopia, correctly conceived, whatever that might be
- Optimally balances and maximizes all important types of value
- Takes account of all opportunity costs
- Is something which, upon having achieved it, both you and everyone else is about as glad as you all could be that it has been achieved
A terminal goal which has a large-scale error would be one that accidentally:
- Forgot something you really wanted
- Includes something you really didn't want
- Is unachievable, or requires taking excessive risk
- Includes dystopian qualities due to a missed crux or crucial consideration
- Fails to balance important types of value
- Misses out on massive value and so is in fact highly suboptimal due to opportunity cost, even if it is relatively good
- Is something which, upon having achieved it, you regret it, it harms others, or it has net negative effects due to zero-sum or negative-sum dynamics with the goals of others
By avoiding all of the large-scale errors just described, (and any others I missed,) you would hence have designed a good terminal goal.
Subgoals
Something similar can be said of subgoals, a good subgoal is one which:
- Optimally contributes to the terminal goal
- Is, in itself, either good or neutral, or at least not unacceptably bad, such as by breaking laws or moral prohibitions
- Good subgoals fit together synergistically with the other subgoals, and take account of all important cruxes and crucial considerations so as not to take any unnecessary risks or do unnecessary harm
A subgoal which has a large-scale error would be one that:
- Fails to effectively or efficiently contribute to the terminal goal, perhaps jeopardizing attainment of the terminal goal
- Is, in itself, unacceptably bad
- Chains together with the other subgoals poorly, or which misses some crux or crucial consideration such that it ends up causing some unacceptable side effect or catastrophe.
Again, by avoiding these and any other possible large-scale errors, you could be pretty sure you have designed a good subgoal.
So, it seems that if one has both good terminal goals and good subgoals which collectively achieve the terminal goal, one would avoid all large-scale errors and be in a state of wisdom.
Now that we have a clear definition of wisdom, we can next take a look at artificial wisdom.
Artificial Wisdom
“Artificial Wisdom” (AW) is the term I will use to denote artificial intelligence systems which substantially increase wisdom in the world.
Note that this could mean both AI which increases human wisdom, as well as autonomous AI systems which operate independently with substantial wisdom. While an AW system could be a deliberately designed, stand-alone wisdom-increasing AI system, AW could also be understood as a property which any AI system possesses to a greater or lesser degree; some AI systems may have high AW, and some may have low or negative AW, regardless of their level of intelligence.
Do we really need wisdom now?
I believe wisdom, as defined above, is always a good thing to have, and it is always good to have more of it; it is always good to have better goals, including terminal goals and subgoals, and it is always good to think and plan in ways that avoid large-scale errors.
This is perhaps especially true, however, in the time of rapidly advancing AI when many extremely difficult and extremely consequential decisions need to be made under tight time pressure.
Fortunately, it seems wisdom is likely something that can be built into (or out of) LLM’s, as in Meaning Alignment Institute’s Moral Graph, in partnership with OpenAI. This is just one example, and it is wisdom in the sense of finetuning to act in accordance with human values/perceived wisdom; I think there are many interesting ways you could design artificial wisdom and am currently working on several designs. By deliberately building artificial wisdom into or out of LLM’s, it is possible to create AI with greater AW, and increase the wisdom of humans who interact with these AI systems.
I believe it is important to increase the wisdom of both AI’s and humans, in fact, I think these are the two groups whose wisdom it is most important to increase. They are the most powerful, and most consequential, and it is possible either may end up in control of the future. Wise AI is AI that will have good goals and will avoid large-scale errors; again this could be in an objective sense, and/or according to human standards. I believe that humans are not morally perfect and that in fact we can morally improve over time, and so I hope that our goals and values, our wisdom, also improves and evolves.
It would be especially good if certain critical humans, such as humans who are designing and deploying AI and human decision-makers in governments increased their wisdom, but it also seems good if humanity generally increases its wisdom. It would seem very good if humans had much greater access to wisdom in themselves and AI prior to arrival of artificial superintelligence, so that if we maintain control over AI we do not make large-scale mistakes with it by having it do things that are unwise.
It would also be great if the AI itself was wise, it is funny how in the stories the genies are never wise, perhaps another good first wish would be something like “I wish that you will have my best interest at heart so that you help me to choose and fulfill my next two wishes in a way which is maximally beneficial to myself and all sentient systems now and indefinitely into the future.”
Can AI be too wise?
It seems ambiguous but possibly good if AI’s goals/wisdom/values do not improve too much faster than human goals, in such a way that AI’s goals become so good that they are alien to us. This could be especially worrying if the AI is not fully under control.
For example, perhaps an artificially wise superintelligent AI undergoes its own long reflection and comes to some conclusions with extremely high certainty—conclusions it knows we would also come to through a sufficiently thorough long reflection—by artificially parallelizing millions of years of human-equivalent philosophical work, using its vastly super-human moral intelligence.
Then the AI might come to the conclusion that rapidly ending factory farming or sentient AI abuse that is then occurring en masse, or quickly moving to halt astronomical waste, or in various other ways transitioning the entire future accessible universe into some “morally optimal utopia” as fast as possible is the best thing to do, in spite of human protest—something which indeed most current humans may actually find quite terrifying and undesirable. This (ambiguous) problem may be lessened if the AI is aligned and under our control.
This is somewhere that the two senses of “good” come apart, so it is hard to say for sure what is best. If moral realism is true, then in fact it might be better for an artificial superwisdom, if one existed, to fulfill whatever moral action it has rigorously, indubitably determined is best in some ultimate sense, if it was indeed provably and incontrovertibly correct. However, if moral realism is false, then from a human point of view it is perhaps much better for ASI to remain as close to current human values as possible, and to help humanity discover and enact our coherent extrapolated volition.
I suspect it is possible to have better and worse values and goals, however, I don’t like the idea of losing control, so it seems like it would be nice to have an aligned, under control artificial superwisdom which has the proper terminal goal, but only gradually nudges us along in that direction in a gentle and considerate way; this in itself seems like a wiser action than forcibly converting the world into a morally optimal utopia, but perhaps this is just my human bias creeping in.
Strapping & Four Scenarios
I have been thinking a lot about “strapping” recently. By strapping, I mean that it seems AI is likely to blast off into the stratosphere relatively soon, and it is probably good to strap the most important things we have to AI, so that they take off as AI takes off.
I think wisdom is probably one of the most important things to strap to AI. Again, this includes both autonomous AW and AW enabled human wisdom enhancement. If we are able to solve alignment, then human wisdom seems much more important to improve, so that we are able to wisely use the aligned AI, but if we can’t solve alignment, then AI wisdom may actually be more important to improve, so that it does the right thing even though not under our control.
Scenario 1: No Alignment, No Artificial Wisdom
If we end up in a world where we have no alignment and no artificial wisdom, we are basically toast. AI turns everything into paperclips, so to speak.
This is the unwise genie who we unwisely wish for a paperclip, and it turns us into a paperclip.
Scenario 2: Alignment, No Artificial Wisdom
If we end up in a world where we have alignment and not enough human wisdom, humans may use aligned AI’s to align AI to some dystopia, or massively suboptimal future. Probably not great. If we do have enough human wisdom, this will probably turn out much better.
This is us wishing first that the genie does what we actually want with minimal negative side effects, then wishing for either something unwise or something wise depending on our wisdom. Perhaps one wise wish, as mentioned, is to first wish for an aligned superwisdom advisor.
Scenario 3: No Alignment, Artificial Wisdom
If we have artificial wisdom and no alignment, the AW ASI might help us along to gradually achieve its own level of wisdom out of niceness, or it might rapidly convert the universe to some morally optimal utopia out of moral obligation
This is the wise genie who helps us choose wise wishes because it is wise, or chooses our wishes for us because it knows that this is objectively what is best
Scenario 4: Alignment & Artificial Wisdom
If we have artificial wisdom and alignment, we can use the AI as a wise advisor and it can help us along to make good choices about what to do, with humans having the final say.
This is us wishing that the genie do what we actually want without negative side effects for our first wish, wishing for maximally wise advice for our second wish, then we decide what to do with the third wish with the input of the maximally wise advice.
So?
So artificial wisdom is an improvement on any universe, it moves us from:
Paperclips ->
Superwisdom advisor OR AI unilaterally creates morally optimal utopia
Or from:
Humans use aligned AI to possibly mess everything up/possibly do okay ->
Humans are helped to pick good goals by aligned superwisdom advisor
Notably, this does oversimplify the situation a bit, as the frontier models which first achieve superintelligence may not be the same ones which are most artificially wise; additionally, whether or not humans have enough wisdom could be highly dependent on whether the wisdom of specific humans controlling ASI is boosted by various AW systems; again, including ones besides frontier models. Nonetheless, it still seems that anything boosting wisdom is good, and the closer the wisdom is to the locus of control, the better.
Could pursuing wisdom be bad?
Yes.
- The main way in which I see the pursuit of Artificial Wisdom as potentially detrimental would be in that it might take talent and energy away from alignment research. It could be directly competing for talent that would be working on alignment, and if AW ends up not being tractable, or alignment research is close to succeeding but fails because it has to split talent with AW research, then this could be extremely bad.
- Another way in which the pursuit of AW could be bad is if it generates a misplaced sense of security, i.e. if AW seems wise but in fact it is only narrowly wise or deceptively wise, then perhaps we will be less cautious in taking its advice and actually achieve even worse outcomes than we would have achieved without sub-par AW.
- A third risk is that if wisdom is defined as “avoiding large-scale errors,” this likely requires certain dual-use capabilities increases, as something that avoids errors is likely more powerful than something that makes errors.
In response:
There may be certain types of artificial wisdom which draw less on machine learning technical talent; my current top AW design might be such an option. On the flipside, AW is currently far more neglected than alignment, and it possible that AW is more tractable and as important as alignment. Perhaps another option is to focus on drawing talent from the commercial sector, as some wise AI products could be very useful and people might pay for them.
These questions are highly uncertain, and I still see this as the biggest potential objection to pursuing artificial wisdom. Perhaps it is also wise to see ways in which alignment and wisdom could be synergistic, rather than competing, for example AW could:
- Help us make fewer mistakes around alignment and governance strategy
- Help AI researchers/executives, and leaders in industry/philanthropy/government to act with more wisdom
- Help us better select subgoals within and related to alignment
- Help us better understand terminal goals so we understand what we need to align AI toward
- Help us use aligned AI wisely
- If we have a misplaced sense of security due to AW, this likely means we do not have enough of the right kind of AW, and have not thoroughly tested it for accuracy. It is therefore very important to test AW for alignment and accuracy, and interpretable methods should be preferred, especially when using increasingly powerful AW in high stakes situations. Furthermore, it is probably wise to strongly focus on developing AW systems which show promise for helping us navigate the AI takeoff transition safely.
- To the third issue, indeed, much artificial wisdom will likely require capabilities increases, however, the important thing is that either the AW will come after the capabilities and so it just uses those capabilities and did not play a role in causing them, or, it is essential that if pushing forward AW increases capabilities, it is very carefully considered whether it seems likely to mitigate x-risk more through increasing wisdom than it exacerbates x-risk through increasing capabilities.
Overall, to those inclined toward artificial wisdom, I think it is highly useful to do more research into the area to see whether there are tractable AW strategies that seem to have minimal risks and high upside, especially in ways that are synergistic with alignment. It seems like a good area to explore for someone who gets excited by wide open territory and innovative thinking.
What wisdom is important to automate?
In order to keep this piece succinct, I will assume the validity of the argument that there is a significant chance of AI x-risk, including suboptimal lock-in scenarios, in the relatively near future, and will accept a reasonably high credence in longtermist arguments that the long-term future has enormous expected value, and therefore one of the key determinants of the wisdom of an action is its effects on x-risk and the trajectory of the long-term future.
There are certainly many other important aspects of wisdom to automate, and much of the automation of wisdom applied to x-risk can also be applied to other topics, however for now I will focus primarily on this type of wisdom, as this feels especially pressing to me if the arguments about near-term AI x-risk and potential lock-in are correct.
Cruxes & Crucial Considerations
Because of the number of cruxes or crucial considerations related to x-risk and longtermism, and the enormous combinatorial interaction between them, and the high uncertainty concerning some of them, we run into problems of complexity and chaos, and the great difficulty of predicting the long-term future and what outcomes are possible or even desirable, and what actions lead to what outcomes.
This produces a state of “cluelessness” about the future and what types of actions are actually good. A great deal of effort has been put into analyzing the situation and certain existential security factors have been postulated such as AI alignment, good epistemics, moral progress, a highly effective longtermist community, good reflective governance, etc.
To the degree that these existential security factors are good subgoals of the hopefully good terminal goal of a thriving positive long-term future, these could be said to be “wise” things to pursue.
Yet there is great uncertainty as to the relative value of each of these x-security factors, how to effectively pursue them, in what order to pursue them, and perhaps there are even more important factors than the ones we have discovered so far, or perhaps we should be much more highly prioritizing ones which are currently low priority. Even if we are currently prioritizing things correctly, as the world situation evolves the analysis will likely continue to shift, perhaps in sometimes sudden or hard to notice ways.
So it seems that perhaps one of the most important types of wisdom to increase is wisdom concerning optimal existential risk strategy, existential security factors, good reflective governance mechanisms to steer toward good futures, what good futures actually look like, and in general all of the most important crucial considerations around x-risk and longtermism, as well as how they all fit together with each other.
This could mean artificial wisdom would be doing original research around these topics. It could also mean AW is acting as a research assistant or in some way aiding researchers who work on these topics.
Strapping Human Wisdom to AI via AW
Another possibility is that AW could increase the wisdom of longtermists and people working on x-risk, so that they can be more effective in their thinking, more effective in their actions, and more effective in their projects. There are a number of striking examples, but I am sure innumerable more small and medium examples, where such types of people wish they could have acted with greater wisdom.
It may also be high leverage to increase the wisdom of people who are not explicitly longtermist or working on x-risk, but whose work has great potential impact on the long-term future, such as researchers working on AI; certain decision-makers in government, large corporations and nonprofits; and certain influencers and key media sources.
Another possibility is that, if useful enough, AW might become a tool used extremely widely by humans much in the way that search engines, social media, and online shopping are ubiquitous today. Broadly increasing the wisdom of a large minority and perhaps eventually majority of humans could have broad, hard to predict beneficial effects on society.
What might AW look like?
In future article in this series, I will explore what artificial wisdom might actually look like. You can browse a full list of posts at the Series on Artificial Wisdom homepage.
Executive summary: Artificial wisdom systems that increase wisdom in both AI and humans could help mitigate existential risks from advanced AI by improving goal-setting and decision-making, though pursuing artificial wisdom also carries potential risks.
Key points:
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.