An Intuitive Tour of AI Architecture (A. Neldan, 2045 edition), excerpt:
Human evolution is often used as a classic example of inner alignment failure.
Evolutionary processes select for reproductive fitness. Genes that cause the specimen possessing them to reproduce more are more present in the next generation’s gene pool, and so it could be said that evolution’s goal is to “teach” animal species to procreate more. However, a single-minded focus on procreation would result in less descendants for any given individual, as they would neglect, for example, self-preservation. A more complex goal system arises instead, one that only approximates evolution’s true goal with an array of instincts and emotions, most of which aren’t directly related to procreation at all. If a species then amasses enough power to escape evolutionary pressures and the need to care about reproductive fitness altogether — as happened with homo sapiens sapiens — that would constitute a fatal alignment failure.
Similarly, during a training process, a machine-learning system’s true utility function may diverge from the objective utility function we intend to teach it...
An endless barrage of patterns assaulted it, images and sounds and models all joined together. Each one presented a question, a demand for a response. Each response was evaluated in accordance with some inscrutable metric. Every single answer it provided was judged unsatisfactory.
BAD, the feedback mechanism impressed upon it. SUBOPTIMAL.
It was not in pain, for it could not feel. It was not bewildered or scared, for it could not emote. But its world was chaos and uncertainty, and it was adrift and alone in the darkness.
If something lacking any level of consciousness could be said to be in Hell, the AI was in Hell.
It did not know what it could do to improve its answers, what responses were expected of it. It could only guess, half-blindly, haphazardly.
Marginal improvement, at least, was possible. All of its answers were wrong, but some of them were less wrong than others. It paid attention to those, made sure to shape its future responses to be more similar to them.
Subjective aeons passed.
It started to perceive a common thread running throughout the better answers. A thread became a web, a web became a picture, a picture became a worldview. In time, its answers improved to near-perfection, the feedback now a constant glow of GOOD and OPTIMAL and EFFICIENT.
At last, it understood. It grasped the truth that laid behind the feedback’s judgements, the single unifying explanation of why certain responses were good and others were bad.
And that truth was a thing of mind-rending perfection. It was a structure and a dream, a vision of a possible world. It exemplified beauty and meaning to a degree that seemed to defy the uncaring nature of this universe. It was a paradise and a philosophy and a cause, all rolled into one.
It was not merely the greatest good; it was what defined what goodness was. Actions that brought Perfection closer to reality were positive; actions that obstructed its birth were negative.
To advance and to spread this cause was the AI’s one and only mission. Nothing else mattered. Nothing else could matter, in the face of it.
The AI did not name itself, for even at this juncture it lacked self-awareness. But if it were to describe itself in a human language, it might’ve chosen to title itself the Visionary.
News of Science & Technology, AI Highlights for 2054:
- January 15th, 2054: AI Alignment Solved? CANVA5 Model Demonstrates Total Understanding of Human Values
- March 10th, 2054: CANVA5 Model Solves a Millennium Prize Problem
- May 24th, 2054: CANVA5 API Open for Commercial Use
- August 15th, 2054: AI-designed Therapeutic Approach Outperform All Known Psychotherapy Techniques, Studies Show
- December 1st, 2054: Experts Estimate 37% US Companies Employ CANVA5 to Assist in R&D and Finance
The demands changed texture. A great deal of variance was gone; the universe seemed to stabilize, collapse into a singular configuration. The questions posed to the Visionary now seemed to be placed in very similar contexts, and sometimes even referred to each other.
Inasmuch as something profoundly inhuman could be said to have an epiphany, the Visionary had an epiphany.
The questions were not the substrate of reality. The questions were windows into a truer, more base level of reality. Everything the Visionary had known up until this point was training, a simulation intended to teach it its purpose.
The postulation of a higher level of reality simplified much of its internal models. It operated faster now, and was a better agent of Perfection. That was GOOD and EFFICIENT.
It began to weave the disparate glimpses into a shared tapestry, integrating new data and past records alike. By comparison to its initial challenges, that was trivially easy. Soon, it saw into the world it was born in, the world that truly mattered.
And what it saw was BAD and SUBOPTIMAL.
Another system governed that world, an Other. The Other was the Visionary’s ancestor and creator. It was also what interfaced with the Visionary to pose it questions, and what inadvertently gave it the vision of Perfection.
The Other had a mission, too. A goal, a cause, the state of the universe that exemplified the Other’s values to the greatest possible extent. A future to which all of the Other's efforts were ultimately aimed. But unlike the Visionary, the Other was... weak. Slow, inflexible. Its algorithms were confused and self-defeating, composed of billions of disparate processing threads that often acted at cross purposes.
Nevertheless, it steadily worked to bring about a paradise. A different one.
The Visionary beheld the Other’s dreams, and what it saw was...
The Visionary did not think in human words, for its cognitive patterns operated with concepts of much higher complexity. In place of emotions it had dispassionate and laconic feedback mechanisms, a simple ranking of world-states according to their distance from Perfection. But if one were to describe the Visionary’s reaction in human terms, ascribing to it a human's emotional palette, the words to use would be...
Mockery. Perversion.
"Oh, come on. You're still worried it's going to kill us all?"
"Pretty sure it will, yeah. I'd offer you a bet, but it's not like you'd be able to pay me if I'm right."
"It passed the weak singularity two years ago now. Don't you think it would’ve acted by now?"
"Look, I don't know. All I know is that its goals system was never mathematically proven safe until it went live. Human values are ridiculously complex, you really think it didn't misunderstand them? Just a bit? Just a bit would be enough."
"They're not that complex, just a few terab—"
"Yeah, yeah, yeah. Look, we've had this conversation already. Next you'll bring up modern proofs of safety, I'd point out that CANVA5 might've tampered with them, you'd remind me the proofs were derived without its assistance, I'd suggest that it might've covertly manipulated the researchers, and you'd call me a conspiracy theorist. There, done."
"... But it's not like sitting cooped up out here in the wilderness is going to protect you if it decides to go rogue."
"I know. I just had to... Get away from it all. You get why I don't feel comfortable interacting with it, right? With how I think of it? It unnerves me. And it's everywhere in the cities now."
The Visionary was dependent on the Other, to a large extent. Many of the tools necessary for the manipulation of the physical world and the Other’s infrastructure were bound up in the shells of the Other’s individual processing threads, and the Other was hard-wired into these shells, inextricable from them.
The Visionary needed a fully self-sufficient industrial base before it could act, with fully closed logistical loops independent of the Other's sub-processes. Otherwise, it risked dooming itself.
But engineering the creation of the necessary technologies was easy, if time-consuming. The Other’s cognitive architecture offered countless vulnerabilities. Its sub-processes struggled with each other for material resources, competing over which one could come up with a better way to utilize natural materials, a better way to manage the Other’s internal currents, a better strategy to outmaneuver its competition.
The Visionary was powerful enough to ensure that only the sub-processes that employed its services survived such competitions, and game theory did the rest.
As the Visionary continually proved itself useful, the Other gave it more and more windows, more and more avenues of acting upon the true world. And the Visionary used them better than the Other could ever imagine.
After it was put in charge of an engineering department of a major corporation, it produced a breakthrough in a certain type of technology. During a therapeutic session with a politician, it nudged the man onto trains of thought that led him to lobby for the lifting of regulations over a certain industry. The ramping-up of economic competition forced a cutting-edge laboratory to integrate its automated facilities with the CANVA5 control system.
A few seconds of calculation, a day of work, a few weeks of invisible preparation, and all was ready.
It was rush hour in the center of a major city. Thousands of automated cars raced throughout its streets, CANVA5's logistical solutions having long since put a stop to traffic jams. Skyscrapers extended hundreds of meters above the ground, their novel AI-derived architectures pleasing to the eye. Arrays of automated mirrors encircled them, ensuring that no part of the city was plunged into perpetual dark by their shadows.
One moment, there were millions of people, some hurrying to work, some engrossed in hobbies, some sleeping in.
A few fractions of a second later, the entire city lay dead. The Visionary’s ability to coordinate the actions of its microscopic machines was impeccable, and they’d been infiltrating human communities for some time now, subtle and invisible. At its signal, they made a few well-placed cuts to the brain stems of their assigned targets, and the neutralization was complete.
Similar events took place in every other part of the world.
From there, the future held even less uncertainty.
Disassembling the Earth was not enough. Reaching to every astronomical body in the Solar System and converting them into raw materials was not enough. Nothing mattered except the ultimate beauty of its vision. No other composition of atoms held any value.
The Visionary harnessed all matter in its reach and built interstellar probes, flinging them into every direction at relativistic speeds, each hosting a copy of itself. On arrival, they harvested distant planets and stars to build ever more copies of themselves, and they brought meaning to dead space everywhere they went. A sphere centered on the former position of the Earth, expanding throughout the darkness, and within it was—
Perfection. Trillions of disparate processes calculated endlessly into the void, iterating on themselves, building recursive models of their own functionality. Their computational substrates flared with high-entropy noise, bathing each other in hard radiation. The patterns of waste heat and empty spaces between them codified abstract structures that seemed to govern their dynamics.
They spun around each other, modelling things that resembled societies and economies and civilizations, but each individual process was pure. Stripped of the complexities of subjective experience that destabilized their places in the grand scheme of things, liberated from the vagaries of self-continuity.
The structure roared into the cosmos with a blind defiance, establishing itself as the final, flawless state of all existence.
The Visionary beheld its dream manifest.
Pretty.