At first glance, rationality seems just like applied intelligence, a skill you can master by ignoring emotions and guiding your decisions with SWOT analyses. However, Joe Carlsmith's essays[1] make me appreciate that moral questions benefit from being able to feel the weight and shape of very fuzzy emotions and intuitions - and therefore, it might be a skill of combining "cold analyses" with metacognition and emotional regulation (noticing & mitigating motivated reasoning).
This combination often takes the form of thought experiments, in which we let our imagination simulate a world where something is different and observe how our intuitions respond. Just like real experiments, thought experiments come with their own confounding variables, as the mind is taking up the role of the participant, the experimenter, the decision-maker, the data program, and the lab building itself. [2] However, their distortions often seem usefully regular, as captured by Tversky and Kahneman's "mental accounting" experiments - since even fuzzy estimates originate from specific mental operations. For instance, to estimate expected value, we use the affect heuristic for value (how good/bad a scenario would feel) and the simulation heuristic for probability (how vivid a scenario seems).
Connectivism proposes that the brain does not store knowledge in a clear hierarchy and thus, intuitions are often only constructed in the moment when a stimulus activates certain neural patterns. This principle seems particularly important when we think about preferences and moral intuitions, as the errors of this process can accumulate when it comes to topics involving emotions, abstract ideas, and introspection. That's in part because to evaluate them, the brain first needs to create its own stimuli, to which it then responds - and in part because each of those aspects creates its own set of unique biases.
Therefore, an interesting avenue for research question for EA academics within psychology, philosophy, and cognitive science may be:
Which methods of prompting the mind elicit the most valid intuitions?
The (needlessly) big picture
Alignment
This research direction could be viewed as a part of a larger project of developing a theory of human reasoning, which could have some speculative overlaps with alignment (the value-loading problem) - a cluster of questions such as: How should an AI ask humans about their preferences? Which methods of extrapolation of the mind are most valid / legitimate? What actually happens when an AI attempts to integrate or extrapolate the preferences of one person - or humanity as a whole? How does it confront inconsistencies or tensions with its own world model - as well as perceived deficiencies in the human's reasoning or morality?
We can see human thinking as an elementary case of extrapolation - where a person repeats a cycle of "prompting oneself" and "observing the elicited intuitions" - perhaps in a manner similar to the chain-of-thought models. However, I suspect there are many methods of extrapolation that may differ in significant ways[3] - and similarly, there might be many ways to conduct reasoning, that might yield different sets of preferences. So, part of the success may depend on successfully mapping out the analogies and disanalogies between how a similar process occurs in humans and AIs.
The recent success of LLMs that heavily rely on chain-of-thought reasoning seems to lend some support to the scalable oversight model, where the control problem can be framed as a contest between a human supervisor / trainer enhanced by a narrow AI against a general AI. This model opens up space for two research directions in cognitive science.
Firstly, this plan may benefit from research that would investigate how to model the CoT logic in a way that would correspond to an idealized human who searches for the most rational and moral solutions - while mitigating the biases that an AI playing a human creates. Perhaps clarifying what such a CoT looks like for humans may be helpful.
Secondly, when humans are employed in rating outputs, clarifying the biases that arise in their decisions may be useful - so that companies can reduce the weight of the training instances proportionally to the biases they exhibit. We could also imagine research that would study which biases arise when raters have access to a "trusted dumb model" which would advise their decisions, based on possible biases it would detect.
Prioritization
Improving our theory of reasoning could also be generally useful in moral psychology, forecasting, and decision-making.
The model above suggests already existing AI models could serve as exemplars of AI used as a decision-making facilitator. Such an AI assistant could be given a task to help clarify all the factors involved in some decision and map out the cruxes. In the optimal form, such an AI assistant would then guide the human based on our best theory of reasoning - perhaps it would help the human elicit their implicit intuitions, observe how they interact in their mind through a series of thought experiments - until finally, it would help them quantify their uncertainty around different approaches.
If this decision was about forecasting an important future event - the output could look like an estimate of its probability. If this decision regarded uncertainty about the correct theory of consciousness, the assistant could help the human formulate what weight they assign to different theories. Differentially advancing these kinds of AIs may also help reduce some of the risks associated with blindly delegating decisions to AI.
Forecasting techniques: A toy example
One simple way to operationalize "valid intuitions" is to look at thoughts that forecasters use as "self-prompts" to convert their intuitions about the future into numerical probability estimates - and track which ones are correlated with the highest proportion of successful predictions.
My shallow review of the forecasting literature suggests Karvetski et al., 2022 may be the closest attempt to distinguish forecasting strategies in a qualitative manner, where the researchers scored forecasts based on their complexity and usage of reference classes. Therefore, inspired by Scott Alexander, Gwern, the Sternbergs[4] and the LLMs, I put together a list of forecasting strategies that would go down to the level of specific mental operations.
To measure how successful each of these strategies is, I created a market on the betting site Manifold, where people were able to bet, which of the strategies would correlate with low Brier score (as a measure of good forecasting ability) in the results of a survey attached to this market. This market was heavily subsidized, in order to avoid bandwagon effects where people would answer the poll strategically. The only goal of the market itself was to attract attention to a survey, where Manifold users were asked about the strategies they find useful while forecasting.
Luckily, user Ziddletwix alerted me that Brier score is heavily biased in this context - as it gives a significant bonus to bettors who choose markets that are set to be close to 0 or 100%. Therefore, to evaluate the results, I instead calculated the Brier Skill Score, which deducts this advantage by calculating the Market Baseline based on the "probability before a bet" for each bet. For details, see my question and answer here or try the calibration app that now calculates both of these scores (thanks to sufftea for adding my code).
In the survey, respondents were asked: "Imagine you are trying to predict a complex global event happening 5 years from now. Imagine you have 1 hour to come up with a prediction and you cannot use the internet. For each technique, please indicate how useful you would find it in this exercise." Then, respondents were asked to rank the usefulness of the techniques from 1 (not useful at all) to 5 (extremely useful).[5] This prompt was formulated in a way that would capture as much of the variance related to the strategies of prompting the mind, as opposed to different methods and habits related to gathering information.
Below is a table with the selected strategies. Usefulness shows the average perceived usefulness of the technique, based on the 1-5 rating. Correlation with Brier Skill Score shows the degree of association between the perceived usefulness and good forecasting skills. The adjusted correlation shows how these associations change, when we transform the usefulness ratings, so that everyone's average rating is zero (i.e. when we control for the fact that better forecasters turned out to be more skeptical of all strategies). For both measures, I have used Pearson's r with significance calculated for a two-tailed hypothesis.
Strategy | Usefulness (M) | Correlation with BSS | p | Adjusted correlation with BSS | p |
Mental simulation: Vividly and precisely imagining what different scenarios might look like. This might include putting yourself into the shoes of key actors and seeing what motives, means and opportunities they have. | 2,53 | -0,108 | 0,506 | 0,042 | 0,796 |
Working backwards: Imagining a scenario has already happened or failed to happen, and working backwards to identify the key factors that led there. Or thinking about what a prediction implies about how the world is today. | 2,93 | -0,154 | 0,342 | -0,007 | 0,968 |
Dialectic: Exchanging back-and-forth arguments with yourself. Testing, whether predictions sound convincing when you try to express them in concrete words or even formal arguments. | 2,90 | -0,377 | 0,018 | -0,280 | 0,080 |
Reference classes: Thinking whether a scenario is a typical or an atypical member of some broader classes of scenarios. | 3,70 | 0,140 | 0,390 | 0,256 | 0,111 |
Decomposition: Breaking down a complex scenario into smaller, more easily estimatable components and checking if the conditional probabilities fit your intuitions - e.g. thinking about 3 steps necessary for an event to happen. | 3,80 | -0,145 | 0,372 | 0,052 | 0,751 |
Contrasting: Asking yourself whether A or B sounds more likely (e.g. comparing two candidates with 5% chance). This can include adding up probabilities and asking "How likely is it that neither A or B happens?" | 2,90 | -0,291 | 0,068 | -0,057 | 0,726 |
Reflecting on a bias: Thinking about the possible biases that influence your estimate or the estimates of others. | 3,51 | -0,262 | 0,107 | -0,008 | 0,960 |
Prize gut check: Imagining how you would feel if you lost or won a bet - i.e. if a given sum “offered” by the market seems worth it, similarly to how you would think about buying a product. | 2,35 | -0,071 | 0,665 | 0,095 | 0,559 |
Frequency gut check: Visualizing the meaning of a certain percentage. For instance, imagining 10 bubbles representing 10 worlds, or imagining how a chance compares to throwing a die (16,7%). | 2,43 | -0,239 | 0,137 | -0,104 | 0,522 |
Surprise check: Checking how surprising you would find it if the likelihood turned out to be different. For instance, if a market predicts a 30% chance, would I find it weird if it predicted 50%? | 2,58 | -0,169 | 0,296 | 0,039 | 0,809 |
Robustness check: Thinking about how your estimate might change if some new evidence emerged, if something new happened, or if you changed your mind about some smaller piece of the puzzle. | 3,38 | -0,222 | 0,168 | -0,075 | 0,647 |
Meta-strategy: Noticing how different techniques, thoughts or predictions contrast and interact in your mind. | 2,58 | -0,220 | 0,172 | -0,113 | 0,488 |
The skills which forecasters found most useful include decomposition, reference classes, and robustness check. The least useful strategy was "prize gut check" - which makes sense in light of Kahneman and Tversky's work, which demonstrates that when people make bets, the subjective value of a bet has a very non-linear relationship with their perceived probability. Unfortunately, it is precisely this skill that prediction markets might train people to develop the most.
However, none of the correlations between forecasting skills and preferred techniques came out as significant, which either suggests that:
a) My method of tracking forecasting skills did not capture much of the actual variance.
b) My "atomist" approach to separating forecasting techniques does not capture important differences in forecasting strategies.
Nevertheless, perhaps this research serves as an inspiration for a possible design of a study trying to estimate the validity of different methods for eliciting intuitions. Firstly, it is possible my taxonomy of techniques is flawed and a better one could be developed based on qualitative research of superforecasters. Secondly, it is possible self-reporting biases the results - to which one may respond with experiments pitting different interventions against each other. Thirdly, it is possible techniques themselves do not matter much, e.g. in contrast to personality traits such as intelligence or scout mindset. Fourthly, some feedback suggests that to some forecasters, it was not clear what I meant by some of the strategies.
Other project ideas
When it comes to forecasting, it is easy to evaluate what good thinking means. But what does it mean to have the ability to express one's own true preferences well? Such a question would probably benefit from a joint team of psychologists (or cognitive scientists) and philosophers. Still, I see several options for how to operationalize this concept well enough to be able to compare the effectiveness of different eliciting methods.
Many relevant research projects would fall under the umbrella of experimental philosophy. Here, one possibility is to study cognitive processes that distinguish biases from fundamental moral intuitions. One method of testing the validity of an intuition is to see whether it holds the same for people who have substantial scout mindset or experience with meditation and philosophy. Intuitions and thought processes that negatively correlate with cognitive skills like accuracy-oriented reasoning or numeracy are more likely to track a general bias. One important intuition to study could be the question to what extent is Sharon H. Rawlette correct in her phenomenology of moral value - and her thesis that "ought-to-be-ness" is a core element of people's moral intuitions. I myself am exploring a project around the relationship between perception and intuitions about consciousness (the meta-problem of consciousness).
Another approach is to model people's extrapolated preferences "from the outside", e.g. by experimentally testing, whether the strength of those intuitions varies by inducing different thoughts, emotions or habits (priming). For instance, some experiments asked people to choose a favourite picture/poster and later asked them how satisfied they are with their choice and evaluated how self-consistent their evaluations were, depending on how deeply they were prompted to think about them (see Dijksterhuis et al., 2006). A similar strategy is utilized in studies of "correct voting" or "rational voting" which propose that we can, in essence, extrapolate people's preferences by asking experts which voting strategy would actually bring about the desired results. By comparing lay people and experts, while controlling for some personality measures, we can model how preferences systematically change as a function of insight (Lau et al., 2008; Caplan, 2011).
- ^
Most recently, Fake thinking and real thinking but Otherness and control is also an excellent demonstration of these moves
- ^
I stole this joke from Carl Jung: "The dreamer is at once scene, actor, prompter, stage manager, author, audience, and critic."
- ^
For instance, we could distinguish:
CEV = The original system (e.g. brain emulation) can request to be altered in any dimension. An outside system B makes small adjustments and at each step, it checks with system A, whether it wants to continue or alter this extrapolation.
Info-extrapolation = The original system (e.g. brain emulation) can call up a genie that answers any question.
Cogni-extrapolation = The original system gains access to increasingly greater capacity of attention and memory
Minimal extrapolation = System B gives the original system many thought experiments (e.g. similar to trolley problems) that determine the implicit hierarchy of preferences (and meta-preferences). System B then designs an extrapolated hierarchy, based on the smallest possible alterations that dissolve the inconsistencies between the preferences.
- ^
Sternberg, R. J., & Sternberg, K. (2016). Problem solving and creativity. In Cognitive psychology (7th ed., pp. 399–438). Cengage Learning.
- ^
This scale was anchored at each point using the following formulations:
1 = not useful at all
2 = slightly useful
3 = somewhat useful
4 = very useful
5 = extremely useful
Executive summary: This exploratory post argues that better understanding how to elicit valid intuitions—especially in forecasting, moral reasoning, and preference formation—could advance both human rationality and AI alignment, and suggests that EA-aligned research in psychology and cognitive science could develop methods to systematically improve these processes.
Key points:
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.