Crossposted from LessWrong: https://www.lesswrong.com/posts/zjGh93nzTTMkHL2uY/the-intentional-stance-llms-edition
In memoriam of Daniel C. Dennett.
tl;dr: I sketch out what it means to apply Dennett's Intentional Stance to LLMs. I argue that the intentional vocabulary is already ubiquitous in experimentation with these systems therefore what is missing is the theoretical framework to justify this usage. I aim to make up for that and explain why the intentional stance is the best available explanatory tool for LLM behavior.
Choosing Between Stances
Why choose the intentional stance?
It seems natural to employ or ascribe cognitive states to AI models starting from the field’s terminology, most prominently by calling it “machine learning” (Hagendorff 2023). This is very much unlike how other computer programs are treated. When programmers write software, they typically understand it in terms of what they designed it to execute (design stance) or simply make sense of it considering its physical properties, such as the materials it was made of or the various electrical signals processing in its circuitry (physical stance). As I note, it is not that we cannot use Dennett’s other two stances (Dennett 1989) to talk about these systems. It is rather that neither of them constitutes the best explanatory framework for interacting with LLMs.
To illustrate this, consider the reverse example. It is possible to apply the intentional stance to a hammer although this does not generate any new information or optimally explain the behavior of the tool. What seems to be apt for making sense of how hammers operate instead is the design stance. This is just as applicable to other computer programs-tools. To use a typical program, there is no need to posit intentional states. Unlike LLMs, users do not engage in human-like conversation with the software.
More precisely, the reason why neither the design nor the physical stance is sufficient to explain and predict the behavior of LLMs is because state-of-the-art LLM outputs are in practice indistinguishable from those of human agents (Y. Zhou et al. 2022). It is possible to think about LLMs as trained systems or as consisting of graphic cards and neural network layers, but these hardly make any difference when one attempts to prompt them and make them helpful for conversation and problem-solving. What is more, machine learning systems like LLMs are not programmed to execute a task but are rather trained to find the policy that will execute the task. In other words, developers are not directly coding the information required to solve the problem they are using the AI for: they train the system to find the solution on its own. This requires for the model to possess all the necessary concepts. In that sense, dealing with LLMs is more akin to studying a biological organism that is under development or perhaps raising a child, and less like building a tool the use of which is well-understood prior to the system’s interaction with its environment. The LLM can learn from feedback and “change its mind” about the optimal policy to go about its task which is not the case for the standard piece of software. Moreover, LLMs seem to possess concepts. Consequently, there is a distinction to be drawn between tool-like and agent-like programs. Judging on a behavioral basis, LLMs fall into the second category. This conclusion renders the intentional stance (Dennett 1989) practically indispensable for the evaluation of LLMs on a behavioral basis.
Folk Psychology for LLMs
What kind of folk psychology should we apply to LLMs? Do they have beliefs, desires, and goals?
LLMs acquire “beliefs” from their training distribution, since they do not memorize or copy any text from it when outputting their results – at least no more than human writers and speakers do. They must, as a result, model the text they have been trained on such that they can grasp regularities and associations within the data and effectively recombine tokens in different ways depending on the context of the input. This consists in forming a statistical representation of their training data which is ultimately what the LLM is. As far as “desires” are concerned, under certain prompting conditions, that is conditions of querying trained models for generating responses (Naveed et al. 2023), LLMs may be attested to express desires related to practical matters of survival and persistence through time, for example, a desire not to be shut down (van der Weij and Lermen 2023). Lastly, the “goal” of the LLM is contingent upon the task it will be assigned to complete. In a broader sense, the goal of the LLM is to model text, akin to how the goal of humans is reproduction.
This framework underlies a user’s interaction with a chat model. The user prompts it modeling it as an interlocutor in the context of a conversation, expecting that the LLM will be able to complete the task of outputting answers that correspond to accurate information, insofar as that has appeared somewhere in the training data of the model. As such, it is acceptable to say that “the model knows X from its training data” or that “the model aims to accomplish Y” according to a certain objective. At this stage of their development (the currently available most capable model is GPT-4), folk psychology provides a practical setup for studying LLM behavior, communicating findings, and evaluating capabilities by using a vocabulary that ascribes cognitive states to the models. This is justified as most of GPT-4 outputs and similar level models cannot be distinguished from those of a human agent.
Human-Level Performance
What are some examples where LLM outputs are at the human level?
Some examples of human-level performance can be found in GPT-4 passing, among others, the bar exam (Katz et al. 2024; Achiam et al. 2023), poetry writing (Popescu-Belis et al. 2023), as well as in theory of mind type of interactions with the model (Strachan et al. 2023). These remarkable performances have prompted some researchers to consider that GPT-4 might be already showing signs of general intelligence (Bubeck et al. 2023). While the definition of general intelligence remains a topic of controversy, directly attesting to the models’ capabilities is at the minimum convincing that these systems have expanded to a completely new level of cognitive power compared to their symbolic AI predecessors.
Machine Psychology
Research in the emerging field of machine psychology points to a series of experiments where LLMs of the GPT-3 series are tested for their decision-making, information search, deliberation, and causal reasoning abilities and are found to perform either at the same level as human subjects or better (Binz and Schulz 2023). Furthermore, cognitive psychology experimentations with GPT-4 attempt to specifically illuminate the reasoning capabilities of the systems from an intentional stance point of view as they “rely on a complex and potent social practice: the attribution and assessment of thoughts and actions” (Singh, SB, and Malviya 2023). They do so by creating and testing GPT-4’s abilities against benchmarks in commonsense type of questions, problems from scholastic mathematics contests, and challenging language understanding tasks. The high accuracy results lead the researchers to be confident about the model’s cognitive psychology capability. Developmental psychologists have also tested LLMs against metrics used for the study of human cognitive development (Kosoy et al. 2023). Interestingly, in an experiment with the Google LLM LaMDA, the model’s responses were similar to those of human children with regard to social and proto-moral understanding tasks while the model underperformed for real-world tasks that perhaps require interaction and exploration of the physical environment.
The Analogy of Alien Minds
Dennett offers a useful analogy and describes how anyone who observed highly capable aliens would be inclined to apply the intentional stance when thinking about their cognition. He writes:
Suppose, to pursue a familiar philosophical theme, we are invaded by Martians, and the question arises: do they have beliefs and desires? Are they that much like us? According to the intentional system theory, if these Martians are smart enough to get here, then they most certainly have beliefs and desires—in the technical sense proprietary to the theory — no matter what their internal structure, and no matter how our folk-psychological intuitions rebel at the thought. (Dennett 1989, 60)
The Martians metaphor works well with LLMs. We attest to how these models can complete a series of complicated cognitive tasks and wonder whether or in what sense they could have minds like ours. The intentional stance answer is that the intelligent behavior of LLMs is evidence that they are comparable to human agents and shall therefore be treated as having minds, with beliefs and desires. If the Martians have managed to travel all the way to Earth, there is no other option but to model them as intentional agents. If LLMs showcase similar behaviors that require capabilities for setting goals and aptly making plans for achieving them, then one must model them using the intentional stance. The more intelligent their behavior becomes, the more we shall expect them to have properties of human minds, despite all structural differences in their internal architecture.
Advantages of the Intentional Stance Approach
The application of the intentional stance has facilitated research at least in the following three respects. First, it has allowed for quantitative measurement of LLM capabilities through the construction of behavioral benchmarks or borrowing experimental techniques from human psychology. Second, it has made it clear that the deep learning paradigm is different from older attempts to create thinking machines; for that reason, additional cautiousness should be applied as we may be dealing with some kind of minds with all the challenges that accompany that. And third, it has broadened the range of experimentation to explore and test the bandwidth of the models’ capabilities.
At a more theoretical level, the effectiveness of the intentional stance proves it a reliable tool for explaining the behavior of LLMs and further modeling AI cognition. Firstly, as Dennett points out, brains are for discerning meaning (semantic engines) but can only approximate that. In reality, they only discern inputs according to structural, temporal, and physical features (syntactic engines) (Dennett 1989). If human brains can be studied as syntactic engines that are nevertheless capable of acquiring and possessing concepts, then this may also apply to LLMs, especially when they have mastered human-level tasks. Moreover, it seems apt to argue that among the reasons why LLMs can approximate meaning in a way analogous to that of human brains is because they have been trained on and learned from large and varied human-generated datasets and finetuned specifically for human-level tasks. If humans can only seem to pick up meanings by picking up things in the world that highly correlate with these meanings, that could be just as applicable to LLMs. Therefore, they would be “capitalizing on close (close enough) fortuitous correspondences between structural regularities” (Dennett 1989, 61) alike humans.
Secondly, following Dennett, the point of modeling cognitive systems according to the intentional stance is that we evaluate them on a behavioral basis and that is all there is to evaluate. The skeptic would perhaps reply that there is more to learn about these systems that goes beyond the mere appearances of human-like behavior or that there are ontological commitments to be made. To that objection, the Dennettian can only suggest that the intentional stance serves as a pragmatic tool for tracking differences that would be impossible to go unobserved as they impact how systems behave in the world. In other words, modeling LLMs as intentional systems calls for a deflationary approach to cognition where all there is to account for is how well our tools allow for prediction, explanation, and overall interaction with these systems based on their observable behavior.
Thirdly, as a corollary, the cognitive capabilities of LLMs necessitate the talk about cognitive states in LLMs, thus rendering the intentional stance indispensable. For example, when examining how models reason or could even develop properties such as situational awareness (Berglund et al. 2023) there is no other option but to employ the mental vocabulary. The application of the intentional stance as I have described it, moves from the descriptive observation that AI researchers speak about LLMs in intentional terms to the prescriptive claim that this is the optimal explanatory framework for modeling state-of-the-art LLM behavior.
References
Achiam, Josh, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, and Shyamal Anadkat. 2023. “Gpt-4 Technical Report.” arXiv Preprint arXiv:2303.08774.
Berglund, Lukas, Asa Cooper Stickland, Mikita Balesni, Max Kaufmann, Meg Tong, Tomasz Korbak, Daniel Kokotajlo, and Owain Evans. 2023. “Taken out of Context: On Measuring Situational Awareness in LLMs.” arXiv Preprint arXiv:2309.00667.
Binz, Marcel, and Eric Schulz. 2023. “Using Cognitive Psychology to Understand GPT-3.” Proceedings of the National Academy of Sciences 120 (6): e2218523120. https://doi.org/10.1073/pnas.2218523120.
Bubeck, Sébastien, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, and Scott Lundberg. 2023. “Sparks of Artificial General Intelligence: Early Experiments with Gpt-4.” arXiv Preprint arXiv:2303.12712.
Dennett, Daniel C. 1989. The Intentional Stance. Reprint edition. Cambridge, Mass.: A Bradford Book.
Hagendorff, Thilo. 2023. “Machine Psychology: Investigating Emergent Capabilities and Behavior in Large Language Models Using Psychological Methods.” arXiv Preprint arXiv:2303.13988.
Katz, Daniel Martin, Michael James Bommarito, Shang Gao, and Pablo Arredondo. 2024. “Gpt-4 Passes the Bar Exam.” Philosophical Transactions of the Royal Society A 382 (2270): 20230254.
Kosoy, Eliza, Emily Rose Reagan, Leslie Lai, Alison Gopnik, and Danielle Krettek Cobb. 2023. “Comparing Machines and Children: Using Developmental Psychology Experiments to Assess the Strengths and Weaknesses of LaMDA Responses.” arXiv Preprint arXiv:2305.11243.
Naveed, Humza, Asad Ullah Khan, Shi Qiu, Muhammad Saqib, Saeed Anwar, Muhammad Usman, Nick Barnes, and Ajmal Mian. 2023. “A Comprehensive Overview of Large Language Models.” arXiv Preprint arXiv:2307.06435
Popescu-Belis, Andrei, Alex R Atrio, Bastien Bernath, Étienne Boisson, Teo Ferrari, Xavier Theimer-Lienhardt, and Giorgos Vernikos. 2023. “GPoeT: A Language Model Trained for Rhyme Generation on Synthetic Data.” In . Association for Computational Linguistics.
Singh, Manmeet, Vaisakh SB, and Neetiraj Malviya. 2023. “Mind Meets Machine: Unravelling GPT-4’s Cognitive Psychology.” arXiv Preprint arXiv:2303.11436.
Strachan, James, Dalila Albergo, Giulia Borghini, Oriana Pansardi, Eugenio Scaliti, Alessandro Rufo, Guido Manzi, Michael Graziano, and Cristina Becchio. 2023. “Testing Theory of Mind in GPT Models and Humans.”
Weij, Teun van der, and Simon Lermen. 2023. “Evaluating Shutdown Avoidance of Language Models in Textual Scenarios.” arXiv Preprint arXiv:2307.00787.
Zhou, Yongchao, Andrei Ioan Muresanu, Ziwen Han, Keiran Paster, Silviu Pitis, Harris Chan, and Jimmy Ba. 2022. “Large Language Models Are Human-Level Prompt Engineers.” arXiv Preprint arXiv:2211.01910.
Thanks for this exploration.
I do think that there are some real advantages to using the intentional stance for LLMs, and I think these will get stronger in the future when applied to agents built out of LLMs. But I don't think you've contrasted this with the strongest version of the design stance. My feeling is that this is not taking humans-as-designers (which I agree is apt for software but not for ML), but taking the training-process-as-designer. I think this is more obvious if you think of an image classifier -- it's still ML, so it's not "designed" in a traditional sense, but the intentional stance seems not so helpful compared with thinking of it as having been designed-by-the-training-process, to sort images into categories. This is analogous to understanding evolutionary adaptations of animals or plants as having been designed-by-evolution.
Taking this design stance on LLMs can lead you to "simulator theory", which I think has been fairly helpful in giving some insights about what's going on: https://www.lesswrong.com/tag/simulator-theory
Executive summary: The intentional stance, which ascribes beliefs, desires, and goals to an entity based on its behavior, is the most useful framework for understanding and interacting with large language models (LLMs) given their human-level performance on various tasks.
Key points:
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.