Cross-posted from Alignment Forum.
Epistemic status: Some general thoughts on philosophy of design, that have turned out to be useful for reasoning about AI paradigms for a few years for me.
Summary:
- I briefly discuss the role of paradigms in Kuhnian philosophy of science, and the problem of theory choice.
- I introduce the notion of design paradigms, when discussing design and engineering. For machine learning, this concept is similar to the concept of training stories.
- I then introduce three criteria (or theoretical virtues) that design paradigms can be evaluated on, that play a role in shaping theory choice in design and engineering generally, and AI specifically.
- I then apply these three criteria to some criticisms of the CAIS paradigm (as compared to AI agents), as a case study.
1. Paradigms in Science
In the philosophy of science, Thomas Kuhn introduced the notion of a ‘paradigm’, roughly referring to a class of key theories, tools and approaches, and methodological and theoretical assumptions that govern progress within some discipline of science. The function of a paradigm, as per SEP, is to “to supply puzzles for scientists to solve and to provide the tools for their solution”. Kuhn’s interest in the concept was to provide a commentary on revolutions and incommensurability.
According to Kuhn, scientific revolutions are characterized by disruptions in periods of ‘normal science’ (or within-paradigm progress), wherein new paradigms are introduced leading to a shift in the assumptions, methods and evaluative criteria that govern the discipline. Some examples include the shift from classical mechanics to relativistic mechanics, and the shift from miasma theory to germ theory of disease.
Sometimes paradigms can refer to an overlapping object of inquiry and continue to remain productive, and yet be incommensurable (i.e. lacking common measure) with respect to each other. When the theories provide no basis for evaluation on each other’s standards, and science lacks a common standard to compare them (for example because they have incompatible methodological assumptions, or operate in incompatible ontologies), it can make it hard to provide ‘objective’ justifications for choosing between paradigms. Kuhn called this the problem of Theory Choice, and and raised the question of which criteria we do/should use in making subjective choices here.[1]
2. Paradigms in Design and Engineering
Artificial Intelligence is partly about understanding the nature of intelligence or related properties of cognitive systems, but it is also about the artificial -- designing intelligent artifacts. And therefore in order to understand what paradigms govern progress in AI, and how they do so, we need to look at paradigms not just in science but in design and engineering.
Paradigms and theories operate differently in science compared with design and engineering, relating to the distinct epistemological relationship between the use of theories and the object of those theories. In science, the territory is typically causally prior to our maps, and we are trying to refine our maps to get the best understanding of the key phenomena in the territory. Design and engineering on the other hand, are also about artificiality, and theories often provide us with blueprints for creation of artifacts.
The laws of the universe are a given, but artificial objects are not. Different scientific paradigms will not cause the laws of the universe to alter, but different design paradigms will cause different artificial objects to be created. This makes theory choice for design paradigms particularly important: design paradigms don’t just describe the world more or less accurately, they also directly shape what happens next.[2]
Furthermore, scientific paradigms tend to commit you to a whole worldview. Design paradigms commit you to aspects of a worldview, but not necessarily the whole thing. This makes design paradigms less incommensurable than scientific paradigms, and makes it easier for multiple design paradigms to exist side by side. Automobile engineering and locomotive engineering both make use of the similar scientific principles, but operate within the context of distinct artifact classes.
Although philosophy of science has not focused on the nature of paradigms in design, there are some things we can say about them from common intuitions. A design paradigm is a space of blueprints, denoting a space of possible artifacts, and a way of reasoning about those artifacts. It must help us build as well as understand the built systems.
For example, automobile engineering as a paradigm provides us with ways of deliberating about cars as artifacts, helping us with building cars and reasoning about their behavior, as well as providing forms of knowledge relevant to repairing them if and when they break down. Different paradigms can deal with problems that may share some abstract equivalence. For example, automobile engineering and aerospace engineering both are paradigms that deal with artifacts useful for transporting from point A to point B, though their applicability differs and the forms of knowledge involved are fairly different.
One way of thinking about design paradigms in AI is to consider different approaches to AI as different design paradigms. For example, on this view the move from simulation-based self-play RL to transformer-based language models is a form of paradigm shift.
Some commentators have already discussed design paradigms in AI under different names. A similar idea has been discussed with the name of model-types in ML in this paper looking at the values that govern disciplinary shifts. In my understanding, a good overview of what a paradigm in ML looks like is presented in the concept of training stories, capturing the notion of paradigms that govern how we train models and reason about their properties. For most of the discussion in this write-up, a design paradigm can be substituted with training story (when discussing ML-based systems).
In the section below, I discuss theory choice as a force governing paradigm shifts. However, it is important to note that inter-paradigm choice is not the only way in which paradigm shifts occur. For example, new design paradigms can also emerge through synthesis building on top of each other (for example, early hovercrafts were produced by synthesizing aerodynamics with torpedo boat designs).
3. Three Criteria for Paradigm Choice: Adaptivity, Economy, and Control
Design paradigms can be used to understand and model how progress in AI will develop. We can then examine the values that drive theory choice. Understanding what drives paradigm shifts can also help us understand general features of future paradigms. Furthermore, as I’ll show in the next section, these dimensions can also help us interpret criticisms of alternative paradigms and make relevant distinctions in the nature of the criticisms.
Here the criteria are described in terms of an informal notion of a design paradigm denoted by , capturing the relevant knowledge in the paradigm for building and reasoning about artifacts, and a class of functionings that denotes some large space of possible claims about how an artifact might work or operate or fulfill some specified purpose.
Adaptivity: For some target class of functionings , such that allows to construct an artifact to achieve , provides us means to justifiable belief that will achieve functionings over a class of operating conditions .
For example, when engineering a car, you want to ensure that the car will still work under a wide range of weather conditions, road and traffic conditions, etc. even though one may not be able to ensure that it will work when put in water. Design problems typically want assurances over some set of operating conditions (over which we wish to know that the artifact will work as expected) based on the scoping and assumption that goes into them.
Operating conditions can also include history-dependencies, for example claims like "the car should remain working even after having traveled for 10,000 km over proper roads".
For extremely powerful AI systems, however, where the future conditions are further shaped by the optimizing behavior of AI systems themselves, the principle of adaptivity also entails the principle of adaptive consistency -- that is the system should not self-modify into another artifact which cannot be reasoned about via the given design paradigm.
Economy: For some target class of functionings , such that allows to construct an artifact to achieve , provides us means to justifiable belief that will achieve functionings given some resource budget , such that the economic uses of are proportionate to .
Note that while the economy of design paradigms does not completely preclude their exploration at moderate scales, economic viability does shape widespread use of engineering artifacts and the scientific attention invested into developing them further.
More importantly, the design paradigms can also inform speculations about future economic viability allowing paradigms to mature even if they are not yet economically viable. This often happens through economic institutions that allow making bets on the maturation of various competing paradigms.
In machine learning, paradigms can often claim economic viability by either claiming more general -- that is, ability to exhibit more general intelligent behavior, or competence on a broader variety of tasks -- or, via claiming efficiency in , the amount of data, compute and other resources involved for achieving the same levels of competence.
Controllability: For some target control behavior (functioning classes parametrized by ), such that allows to construct an artifact with control variables corresponding to , provides us means to justifiable belief that will achieve functionings whenever the control variables are .
Different design and engineering problems require different forms of controllability. Often, controllability is part of the economic viability of an artifact, though it is not always the case that more controllable artifacts are necessarily more economically competitive.
While in an ideal world we would want both adaptivity and controllability to be reflected in the economic valuation of artifacts and possible artifact classes that a paradigm enables, insofar as economic viability is measured in narrow performance, it can pose trade-offs with the other criteria.
In some sense, alignment tax can be treated as a measurement of the controllability-economy trade-off for AI paradigms, while the adaptivity-economy trade-off rhymes a bit with the robustness-performance or robustness-accuracy trade-offs in machine learning.
4. Case Study: CAIS vs Agents
When discussing Eric Drexler's Comprehensive AI Services Model in 2019-20 with several alignment researchers, specifically how it offered an alternative way to reason about the trajectory of AI progress that did not involve advanced agency, the different criticisms[3] I heard could be classified into the following general directions:
- A well-integrated ecology of AI services will eventually lead to emergent agents, either due to collusion, or due to emergence of a dominant power-seeking service, or other emergent structures.
- Solving alignment and governance for CAIS will inevitably require solving alignment of agents, as CAIS merely pushes the hard parts of alignment problem into the alignment of security services or R&D services.
- AI Agents can solve useful problems for humans that no reachable collection of AI services can, and therefore there will always be economic incentive for pursuing agentic systems.
- Given a large enough amount of data, compute and model size, integrated agents outperform collection of AI services.
As we can observe, the first dimension of criticism is equivalent to a claim that we should focus on paradigms that help us reason about agents because the model proposed by AI services is not adaptively consistent and will evolve into an agent-like system anyways. The second dimension of criticism challenges the controllability claims of CAIS, challenging whether the control problem of the paradigm can be solved within that paradigm itself or inevitably requires invoking another paradigm.
Note that these are roughly orthogonal dimensions of criticism, at least in terms of logical independence. It could be possible that CAIS does evolve into agentic systems, but that solving the control problem for CAIS was easier than aligning agents, implying that the agent that eventually emerges is an aligned agent. Such a position might imply that aligning agents via building and reasoning about bounded-scope services is a more tractable strategy than trying to solve alignment for already natively-agentic systems (i.e. systems that are agentic at the time of conception).
Conversely, it could be possible that solving the control problem for CAIS does involve solving alignment for agent systems which must take the role of security services (or other core governance roles), but the aggregate behavior of AI services never evolves into an integrated utility-maximizing agent[4]. Similarly, we can argue the orthogonality of 1 and 2 both with 3&4.
The third and fourth criticisms correspond to the economy criterion, claiming that the CAIS paradigm will never be economically competitive, either because it is constrained from achieving some set of economically valuable functionings that agents can, or because it is not as resource-efficient as agents in achieving comparable functionings.
When posed with a novel paradigm, we can use these criteria to assess in what ways will they dominate existing paradigms if successful, and organize criticisms over their claims.
(Parts of these ideas were produced during my work on CAIS in 2020-21 and presented at FHI. Some of those ideas were later refined in discussions with Rose Hadshar in late 2022, and the clarification of those ideas as they pertain to general theory of design happened during AI Futures Residency at Wytham in December 2022. I thank several people who have been involved in discussions and sanity-testing of the criteria discussed.)
- ^
- ^
As an aside, this feature of design paradigms is why Drexler says CAIS is both descriptive and prescriptive, and why claims about CAIS are sometimes hard to reason about in a purely descriptive way. See Drexler, Reframing Superintelligence, 2019, pp. 44-46.
- ^
Some of these criticisms can also be found in Richard Ngo's Comments on CAIS.
- ^
See Chapter 6 of Drexler's Reframing Superintelligence.