Summary: I considered the question Under what conditions should the targets of EA funding be chosen randomly? I reviewed publications on the performance of random decision strategies, which I initially suspected might support randomized funding in some situations. In this post, I explain why I now think these publications provide very little guidance on funding allocation. Overall I remain uncertain whether one could improve, say, EA Grants or Open Phil’s grantmaking by introducing some random element.
I spent about 90 hours on this project as part of a Summer Research Fellowship at the Centre for Effective Altruism.
Introduction
Research question, scope, and terminology
Research question. The question I aimed to investigate was: Under what conditions should the targets of EA funding be chosen randomly? To this end I also looked at the performance of random strategies for decisions other than funding. Since I never found positive conclusions about funding, the intended focus on funding isn’t well reflected in this report, which focuses on negative conclusions.
Criteria for evaluating decision strategies. I was most interested in ex-ante effectiveness, understood as maximizing the expected value of some quantity before a decision. I didn’t investigate other criteria such as fairness, cost, or incentive effects, but will sometimes briefly discuss these.
Scope. I was thinking of situations where an EA individual or organization allocates funding between targets that would then use this funding with the ultimate aim to help others. One example are donations to charities. I was not interested in transfers of money prior to such a decision; in particular, the use of lotteries to pool resources and enable economies of scale – as in donor lotteries – was outside the scope of this project. Neither was I interested in situations where financial transfers are intended to primarily help the recipients, as in social welfare or GiveDirectly’s cash transfers. Finally, I did not consider how the funding decisions of different funders interact, as in questions around funging. [1]
Terminology. I’ll use random strategy or lottery to refer to decision mechanisms that use deliberate [2] randomization, as in flipping a coin. The randomization need not be uniform, e.g. it could involve the flip of a biased coin. Lotteries can use mechanisms other than randomization; for example, allocating only a part of all funding randomly or randomizing only between a subset of options would count as lotteries. I’ll refer to strategies that aren’t lotteries as nonrandom.
Why I was interested
Several publications claim that, when allocating resources, lotteries can outperform certain nonrandom strategies such as grant peer review or promotions based on past performance. I found such claims for promotions in hierarchical organizations, selecting members of a parliament, financial trading, and science funding (see section Survey of the literature for references). They were based on both empirical arguments and mathematical models.
I was curious whether some of these findings might apply to decisions frequently encountered by EAs. For example, the following are common in EA:
- Individual donors selecting a charity.
- Institutional funders sponsoring individuals, e.g. CEA’s EA Grants, Open Phil’s AI Fellows Program.
- Relatively conventional science funding, e.g. Open Phil’s Scientific Research category or some of their AI safety and strategy grants.
For reference, I’m including a link to my original research proposal with which I applied to CEA’s Summer Research Fellowship.
Survey of the publications I reviewed
This table exhibits the most relevant papers I reviewed. Note that my review wasn’t systematic; see the subsection Limitations for more detail on this and other limitations.
Some context on the table:
- The column on Type of argument uses the following categories and subcategories:
- Deductive: Mathematical proofs and analytic solutions – as opposed to approximations – of quantitative models.
- Empirical: Arguments based on data or anecdotes from the real world.
- Qualitative: Non-quantitative discussion, usually case studies.
- Survey: Responses of human subjects to standardized questionnaires.
- Retrospective: Data from non-experimental settings covering some extended period of time in the past.
- Lab experiments: Data from laboratory experiments of the type common in psychology and behavioral economics.
- MC simulation: Monte Carlo simulation, i.e. repeatedly running a stochastic algorithm and reporting the average results.
- Agent-based model: Compositional models that include parts intended to represent individual agents, typically tracking their development and interaction over time to investigate the effects on some aggregate or macro-level property.
- Simple model: Here denotes any quantitative model that isn’t clearly agent-based.
- Comprehensive: Paper provides or cites arguments from several of the above types to make an all-things-considered case for its conclusion.
- I also reviewed publications that don’t explicitly discuss lotteries when I suspected that their content might be relevant. Usually this was because they seemed to reveal some shortcoming of nonrandom strategies. In these cases I report the most relevant claim in the column Stance on lotteries.
- The table only covers those parts of a publication that are relevant to my topic. For example, Frank’s (2006) simulations are only a part of his overall discussion that culminates in him arguing for a progressive consumption tax.
Strength and scope of the endorsement of lotteries in the literature
Several publications I reviewed make claims that are about decision situations in the real world, or at least can reasonably interpreted as such by readers who only read, say, their abstracts or conclusion sections. Examples include:
“The proposals [for introducing random elements into research funding] have been supported on efficiency grounds, with models, including social epistemology models, showing random allocation could increase the generation of significant truths in a community of scientists when compared to funding by peer review.” (Avin, 2018, p. 1)
“We also compare several policy hypotheses to show the most efficient strategies for public funding of research, aiming to improve meritocracy, diversity of ideas and innovation." (Pluchino et al., 2018, p. 1850014-2)
“ This means that a Parliament without legislators free from the influence of Parties turns out to be rather inefficient (as probably happens in reality).” (Pluchino et al., 2011b, p. 3948)
“In conclusion, our study provides rigorous arguments in favor of the idea that the introduction of random selection systems, rediscovering the wisdom and the history of ancient democracies, would be broadly beneficial for modern institutions.” (Pluchino et al., 2011b, p. 3953)
“Finally, we expect that there would be also several other social situations, beyond the Parliament, where the introduction of random members could be of help in improving the efficiency.” (Pluchino et al., 2011b, p. 3953)
“[T]he recent discovery that the adoption of random strategies can improve the efficiency of hierarchical organizations” (Pluchino et al., 2011b, p. 3944, about their 2010 and 2011a)
“In all the examples we have presented, a common feature strongly emerges: the efficiency of an organization increases significantly if one adopts a random strategy of promotion with respect to a simple meritocratic promotion of the best members.” (Pluchino et al., 2011a, p. 3505)
“We think that these results could be useful to guide the management of large real hierarchical systems of different natures and in different fields.” (Pluchino et al., 2010, p. 471)
“It may well be, for example, that when there are many more deserving contestants than divisible resources (e.g. ten good applicants for five jobs), the final selection should be explicitly and publicly made by lottery.” (Thorngate, 1988, p. 14)
Elster (1989, p. 116) asks “why lotteries are so rarely used when there are so many good arguments for using them”. Neurath (1913, p. 11; via [3] Elster, 1989, pp. 121f.) even described the ability to use lotteries when there is insufficient evidence for a deliberate decision as the final of “four stages of development of mankind”.
There also are dissenting voices, e.g. Hofstee (1990). However, my approach was to assess the arguments favoring lotteries myself rather than to search for counterarguments in the literature. I therefore haven’t tried to provide a representative sample of the literature. Even if I had, I would expect explicit anti-lottery positions to be underrepresented because not using a lottery might seem like a default position not worth arguing for.
Real-world use of lotteries
I also came across references to mythical, historic, and contemporary cases in which lotteries were or are being used. These suggest that there are reasons – though not necessarily related to effectiveness – favoring lotteries in at least some real-world situations. I didn’t investigate these cases further, but briefly list some surveys:
- Elster (1989, sc. II.4-II.7, pp. 62-103).
- Elster (1989, p. 104) says he aimed for his list to be “reasonably exhaustive”, and that he would be “surprised if I have missed any major examples” (ibid., p. 104)
- Elster (1989, pp. 104f.) lists the following patterns in the use cases surveyed by him:
- More frequent in democracies or popular estates.
- When they can be interpreted as an expression of God’s will.
- Assigning people for legal and administrative tasks.
- Allocating burdens – as opposed to goods – to people.
- Boyce (1994, pp.. 457f.) describes some biblical, historic, and modern use cases.
- Boyle lists dozens of cases on his website. [4]
- At least three institutional science funders have implemented lotteries (Avin, 2018, 1f.).
Limitations
I here list some limitations of my literature review. I chose not to pursue these points either because I was pessimistic about their value, or because they seemed too far removed from my research question (while potentially being interesting in other contexts).
- My review wasn’t systematic or comprehensive. I started from the publications I had first heard of, which were Avin’s work on science funding and the references in a Scientific American blog post by psychologist Scott Barry Kaufman.
- I generally didn’t try to independently verify results. In particular, I neither replicated simulations nor did I check details of calculations or proofs. Below, I do note where my impression is that a paper’s stated conclusions aren’t supported by its reported results, or where one publication seem to misrepresent another. However, I haven’t tried to identify all problems and merely report those I happened to notice.
- I didn’t (or not extensively) review:
- The following three books, which might be interesting to review. [5]
- Mauboussin (2012)
- Simonton (2004)
- Not extensively Elster (1989)
- Not extensively the literature in what Liu and de Rond (2016, p. 12) call the “random school of thought in management”, e.g. the work of Jerker Denrell. The central claim of that school appears to be that “random variation should be considered one of the most important explanatory mechanisms in the management sciences” (Denrell et al., 2014).
- Not extensively the case of science funding. E.g., I didn’t consult references 69-72 in Pluchino et al. (2018), or most of the references cited by Avin.
- Not extensively the case for random selection of members of parliament, known as demarchy or sortition.
- Work by Nassim Taleb that was sometimes cited as being relevant, specifically his books The Black Swan: The Impact of the Highly Improbable and Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets.
- Empirical and theoretical work on the reliability and validity of relevant predictions of performance or impact in funding decisions, e.g. the work of Philip Tetlock, Martin et al. (2016), or Barnett’s (2008, p. 231) claim that in certain conditions “organizations are likely to fall into competency traps, misapplying to new competitive realities lessons that were learned under different circumstances”.
- Empirical and theoretical work on the conditions under which we might expect that the options from which we select are in some sense equally good, e.g. work related to the Efficient-Market Hypothesis.
My conclusions from the literature review
In summary, I think that the publications I reviewed:
- Demonstrate that it’s conceptually possible that deciding by lottery can have a strictly larger ex-ante expected value than deciding by some nonrandom procedures, even when the latter aren’t assumed to be obviously bad or to have high cost.
- Provide some qualitative suggestions for conditions under which lotteries might have this property.
- Don’t by themselves establish that this beneficial potential of lotteries is common, or that the beneficial effect would be large for some particular EA funding decision.
- Don’t provide a method to determine the performance of lotteries that would be easily applicable to any specific EA funding decision.
- Overall suggest that the case for lotteries is strongest in situations that are most similar to institutional science funding. However, even in such cases it remains unclear whether lotteries are strictly optimal.
In the following subsections, I’ll first give my reasoning behind the negative conclusions 3. to 5. I’ll then explain the two first, positive conclusions.
Unfortunately, the positive conclusions 1. and 2. are weak. I also believe they are relatively obvious, and are supported by shallow conceptual considerations as well as common sense, neither of which require a literature review.
Why I don’t believe these publications by themselves significantly support lotteries in EA funding decisions
I’ll first give several reasons that limit the relevance of the literature for my research question. While not all reasons apply to all publications I reviewed, at least one reason applies to most. In a final subsubsection, I explain why the negative conclusions 3. and 4. mentioned above follow.
Results outside of the scope of my investigation, and thus perhaps generally less relevant in an EA context
In an EA context we want to maximize expected utility ex ante, i.e. before some decision. However, some of the claims in the literature are ex post observations, or concern ex ante properties other than the expected value.
***Shortcomings of alternatives to lotteries that don’t imply lotteries are superior according to any criterion.***
These claims are made in the following context. We are concerned with decisions between options that each have an associated quantitative value. We want to select the highest-value option. However, we only have access to noisy measurements of an option’s value. That is, if we measured the value of the same option several times, there would be some ‘random’ variation in the measured values. For example, when making a hiring decision we might use the quantified performance on a work test as measurements; the scores on the tests would vary even for the same applicant because of ‘random’ variations in day-to-day performance, or ‘random’ accidental errors made when evaluating the tests.
One natural decision procedure in such a situation is to select the option with the highest measured value; I’ll call this option the winning option. I encountered several claims that might be viewed as shortcomings of the decision procedure to always choose the winning option. I was interested in identifying such shortcomings in order to then investigate whether lotteries avoid them. I’ll now list the relevant claims, and explain why I believe they cannot be used to vindicate lotteries according to any criterion (and so in particular not the ones I’m interested in).
The ‘optimizer’s curse’, or ‘post-decision regret’. Smith and Winkler (2006) prove that, under mild conditions, the winning option’s measured value systematically overestimates its actual value. This holds even when all measurements are unbiased, i.e. the expected value of each measurement coincides with the actual value of the measured option.
However, their result doesn’t imply that the ex-ante expected value of choosing another option would have been higher. In fact, I believe that choosing the winning option does maximize expected value if all measurements are unbiased and their reliability doesn’t vary too much. [6]
The winning option may be unlikely to be the best one. Thorngate and Carroll (1987), Frank (2016, p. 157, Fig. A1.2), and Pluchino et al. (2018, p. 1850014-13, Fig. 7), each using somewhat different assumptions, demonstrate that the absolute probability of the winning option actually being the highest-value one can be small.
However, more relevant for our purposes is the relative value of that probability: can we identify an option other than the winning one that is more likely to be the highest-value one? Their results provide no reason to think that we can, and again my initial impression is that given their assumptions we in fact cannot. In any case, we ultimately aim to maximize the expected value, not the probability of selecting the highest-value option. (My initial impression is that these two criteria always recommend the same option at least in the simple models of Thorngate and Carroll [1987] and Frank [2016], but in principle they could come apart.) But neither do these results provide a reason to think that deviating from the winning option increases our decision’s expected value. In fact, Fig. 8(b) in Pluchino et al. (2018, p. 1850014-15) indicates that selecting the winning option has a higher expected value than selecting an option uniformly at random.
Observations of this type may well have interesting implications for decision-making. For example, they might prompt us to increase the reliability of our measurements, or to abandon measurements that cost more than the value of the information they provide. A lottery may then perhaps be a less costly alternative. However, absent such secondary advantages, the claims reviewed in the preceding paragraphs don’t favor lotteries over choosing the winning option.
‘Matthew effects:’ Small differences in inputs (e.g. talent) can produce heavy-tailed outcomes (e.g. wealth). This is a common observation not limited to the literature I reviewed. For example, Merton (1968) famously described this phenomenon in his study of the reward system of science. He attributed it to ‘rich get richer’ dynamics, for which he established the term ‘Matthew effects’.
Two publications in which I encountered such claims are Pluchino et al. (2018, p. 1850014-8, Fig. 3) and Denrell and Liu (2012, sc. “Model 1: Extreme Performance Indicates Strong Rich-Get-Richer Dynamics”); they are also a central concern of Frank (2016). This list is not comprehensive.
Frank (2016) proposes the increasing size of winner-takes-all markets due to network effects as an explanation. Pluchino and colleagues’ (2018) model indicates that heavy-tailed outcomes can result when the impacts of random events on the outcome metric are proportional to its current value and accumulate over time. Outside of the literature I reviewed, preferential attachment processes have attracted a large amount of attention. They assume that a random network grows in such a way that new nodes are more likely to be connected to those existing nodes with a larger number of connections, and show that this results in a heavy-tailed distribution of the number of connections per node.
In our context, findings of this type indicate that we cannot necessarily infer that the distribution of our options’ true values is heavy-tailed just based on heavy-tailed measurements. This is because the true values of our options might be the inputs of a process as described above, while our measurements might be the outcomes. However, again, this doesn’t imply that the expected value of choosing the winning option is suboptimal; in particular, it does nothing to recommend lotteries. For this reason, I did not in more detail review proposed explanations for why we might find heavy-tailed outcomes despite less varied inputs.
Summary. Loosely speaking, the results reviewed in this subsubsection may help with identifying conditions under which simple estimation procedures produce exaggerated differences between the options we choose from. However, they don’t by themselves suggest that these simple decision procedures result in the wrong ranking of options. Therefore, they constitute no argument against choosing the seemingly best option, and in particular no argument for using a lottery instead.
***Advantages of lotteries are based on criteria other than maximizing expected value, e.g. fairness.***
Some results actually say or imply that lotteries do have certain advantages over other decision procedures. However, I’ll now set aside several types of such advantages that appeal to a criterion other than the one I’m interested in, i.e. maximizing ex-ante expected value.
Fairness. The alleged fairness of lotteries, and unfairness of ‘naively meritocratic’ decisions, is a common theme in the literature. Boyce (1994, pp. 457f.) refers to previous work that tries to explain the occurrence of lotteries by their alleged fairness. [7] Thorngate (1988) assumes that “the ultimate goal [...] is to provide a means of allocating resources viewed as fair by all contestants” (ibid., p. 6) and describes the use of performance tests under certain conditions as “breeding grounds of invidious selection vitiating the principle of fairness that characterizes the contests in which they are employed”. Pluchino et al. (2018, p. 1850014-17) mention the “unfair final result” of their model. Again, this list is not comprehensive.
Such claims are hard to evaluate for several reasons. It remains unclear which specific conception of fairness they appeal to, whether they attribute fairness to procedures or outcomes, or even whether they are normative claims as opposed to descriptive claims about perceived fairness. For example, Pluchino et al. (2018, p. 1850014-8) seem to think that an outcome in which “the most successful people were the most talented” would be fairer than an outcome violating that property; but neither do they or argue for this assertion nor do they say whether this criterion exhausts their conception of fairness.
For examples of a more extensive discussion that employs specific definitions of fairness see Boyle (1988) and Elster (1989); Avin (2018, p. 9f.) refers to these definitions as well.
As an aside, the relationship between fairness and lotteries seems to be complex, with lotteries sometimes being perceived as decidedly unfair. See Hofstee (1990, p. 745) for an anecdote where the use of a lottery invoked anonymous phone threats to a decision maker.
In any case, I wasn’t interested in pursuing questions around fairness because they tentatively seem less relevant to me for the allocation of EA funding. Of course, EAs need not embrace consequentialism and thus could have reasons to abide by procedural fairness constraints; even consequentialist EAs might intrinsically or instrumentally favor fair outcomes. However, it seems to me that concerns about fairness are more common when decisions affect a large number of people, when people cannot avoid being affected by a decision (or only at unreasonable cost), or when decision makers are accountable to many stakeholders with diverse preferences. In a similar vein, Avin (2018) notes that “[w]hile the drive for efficiency is often internal to the organisation, there are often external drivers for fairness”. For example, we commonly consider the fairness of taxation, social welfare, and similar policy issues; admissions to public schools; or decisions that allocate a significant share of all available resources, such as in the case of major institutional science funders. By contrast, I’d guess that most EA funding is either allocated by personal discretion, or by organizations whose focus on effectiveness is shared by key stakeholders such as donors and employees.
Psychological impact on decision makers or people affected by decisions. For example, Thorngate (1988, p. 14) recommends lotteries partly on the grounds that they “might relieve adjudicators of responsibility for distinctions they are incapable of making”, and that “[i]t might leave the losers with some pride, and the winners with some humility, if everyone knew that, in the end, chance and not merit sealed their fate”.
On the other hand, Elster (1989, pp. 105f.) is skeptical whether reaping such benefits from lotteries is feasible, and Pluchino et al. (2011a, p. 3509) mention “the possible negative
psychological feedback of employees to a denied and expected promotion” as an objection to random promotions.
Such psychological effects may be relevant when allocating EA funding. However, my guess is that they would be decisive only in rare cases. For this reason, I didn’t pursue this issue further and for now focused on maximizing first-order expected value while ignoring these secondary effects.
Other criteria I set aside. Elster (1989, p. 109) quips that “[w]hen consensus fails, we might as well use a lottery.” Why a lottery? According to Elster, lotteries are more salient, harder to manipulate [8], and avoid bad incentives. [9] [10]
Pluchino et al. (2018) use an agent-based model intended to quantitatively assess the impact of different funding strategies, including lotteries, on the lifetime success of individuals. Perhaps motivated by concerns around fairness, they compare strategies according to “the average percentage [...] of talented people which, during their career, increase their initial capital/success” (Pluchino et al., 2018, p. 1850014-20). [11] Note that this metric ignores the amounts by which the success of individuals changes. It is therefore a poor proxy for what I’d be most interested in, i.e. the total sum of success across all individuals.
Limited evidence provided by agent-based models
Many publications I reviewed present agent-based models, sometimes as the only type of evidence. Unfortunately, I believe we can conclude little from these models even for the cases they were intended to capture.
To explain why, I’ll first summarize my reservations about the agent-based models of Pluchino et al. (2010) on promotions in hierarchical organizations. I believe this example illuminates structural shortcomings of agent-based models, which I’ll explain next. In principle, these shortcomings could be overcome by extensive experiments and a careful discussion. However, I’ll go on to describe why the publications I’ve reviewed succeed at this only to a limited extent, often coming back to Pluchino et al. (2010) and related work as the most illustrative example.
The model of Pluchino et al. (2010) is based on implausible assumptions, and regression to the mean is the main explanation of their results. [12] Pluchino et al. (2010) assume [13] that the performance of all employees within an organization is determined by independent draws from the same distribution, with a new draw after every promotion. An employee’s current performance is thus completely uninformative of their performance after a promotion. What is more, there are no interpersonal differences relevant to performance.
Put differently and ignoring arguably irrelevant details of their model, they effectively assume an organization to be a collection of independent fair dies, with organizational performance being represented by the sum of points shown. Promotions correspond to rerolling individual dies. What’s the best strategy if you want to quickly maximize the sum of the dies? Due to regression to the mean, it’s clearly best to reroll the dies with the lowest scores first. Conversely, rerolling the dies with the highest scores will result in the largest possible expected decrease in performance. Rerolling dies uniformly at random will be about halfway in between these extremes.
In other words, we can easily explain their main result: Under the assumptions described above, promoting the best employees will dramatically harm organizational performance; promoting the worst employees will dramatically increase performance; and random promotions will be about halfway in between.
Unfortunately, as I’ll argue below the key assumptions behind this result are implausibly extreme. [14]
Ad-hoc construction without an underlying theory. There seems to be no unifying theory behind the agent-based models I’ve seen. They were made up ad hoc, sometimes haphazardly drawing upon previous work in other fields. [15] It is therefore hard to immediately see whether they are based on sound assumptions, and if so to what extent their results generalize. Instead, we have to assess them on a case-by-case basis.
Uncanny medium level of complexity. One useful way to study the world perhaps is to examine a specific situation in a lot of detail. While findings obtained in this way may be hard to generalize, we can at least reliably establish what happened at a particular place and time.
Modeling, by contrast, is based on deliberate simplification and abstraction. When a model is based on a small number of simple assumptions and parameters, it is easy to examine questions such as: Which assumptions are crucial for the derived results, and why? Which types of real-world systems, if any, are faithfully modeled? How can we determine appropriate values for the model parameters based on empirical measurements of such systems?
For example, consider Smith and Winkler’s (2006) work on the optimizer’s curse. Their setting is highly abstract, and their assumptions parsimonious. Their results thus apply to many decisions made under uncertainty. Their central result is not only observed in simulations but can be crisply stated as a theorem (ibid., p. 315, sc. 2.3). By examining the proof we can understand why it holds. Overall, their discussion doesn’t stray far from basic probability theory, which is a well-understood theory. This helps us understand whether their assumptions are reasonable, and whether the modifications they chose to explore (e.g. ibid., p. 314, Table 2) are relevant. Similar remarks apply to Denrell and Liu (2012).
Contrast this with Pluchino and colleagues’ (2010) agent-based model of promotions in hierarchical organizations. It depends on the following assumptions and parameters, some of which are quite complex themselves:
- A directed graph representing the organizational structure, i.e. possible promotions from one position to another.
- Weights controlling to what extent the performance in a position affects organizational performance.
- The age at which employees retire.
- A performance threshold for firing employees.
- The distributions from which the degrees of competence and age of newly recruited employees are initially drawn.
- The dynamic rules for updating the age, performance levels, and positions (via promotions) of employees.
Given this level of complexity and the lack of a theoretical foundation, their simulation results initially appear as a ‘brute fact’. We can’t immediately see which assumptions they depend on, and how sensitive they might be to the values of various parameters. As a consequence, it is hard to see which real-world organizations their results may apply to. Does my organization have to have exactly 160 employees and six levels of hierarchy, as in their model? What about the frequency of promotions relative to the age of recruitment and retirement? Is the variance of performance between positions relevant – and if so, how could I measure its value? Etc.
I have argued above that it is in fact possible to easily explain their results – they are due to regression to the mean. Note, however, that my analysis was based on removing arguably irrelevant complexity from the model; its many details obstruct rather than assist understanding. This problem is illustrated by Pluchino et al. (2010) never presenting a similar analysis. Instead, they’ll have to answer the above questions by additional experiments. While they partly succeed in doing so, I’ll argue that they ultimately fail to address the main problem I identified earlier, i.e. their use of arguably implausible assumptions.
Lack of empirical validation. By empirical validation I mean roughly stating a correspondence rule between both the inputs (assumptions and parameters) and the outputs of a model on one hand, and the real world on the other hand. If these rules are appropriately operationalized, we can then empirically test whether the input-output pairs of a model match the corresponding real-world properties.
I believe it is telling that the only model I’ve seen that presents some amount of quantitative validation is the one by Smith and Winkler (2006, pp. 314f.). By contrast, all agent-based models I’ve seen were validated only in a qualitative way, [16] if at all. [17] The problem here is not just the absence of relevant data, but that it’s unclear precisely which real-world phenomena (if any) the model is supposed to represent. One exception is Harnagel’s (2018) agent-based model of science funding, which extends Avin’s (2017) epistemic landscape based on empirical bibliometric data. [18]
Missing or unconvincing discussion of assumptions and parameters. When there is no empirical validation, it would at least be desirable to have a rough discussion of a model’s assumption and default parameter values. While evidence from such a discussion would be weaker, it would at least partly illuminate how the model is meant to relate to the real world, and could serve as a sanity check of its assumption and default parameter values.
Avin (2017) provides the most extensive discussion of this type that I’ve seen; I’ll just point to a few examples. First, he clarifies that the scalar value attached to each research project in his model is meant to represent the amount of discovered “truths that will eventually contribute in a meaningful way to wellbeing” (ibid., p. 5). To motivate the selection mechanism operating on the modeled population of scientists, he refers to empirical work indicating that “[n]ational funding bodies support much of contemporary science” (ibid., p. 4). He also acknowledges specific limitations of his model, for example its landscape’s low dimensionality (ibid., p. 8., especially fn. 3). Crucially, he is able to defend his choice of a particular parameter, the size of the scientists’ field of vision, based on previous empirical results (ibid., p. 12). This is particularly important as he later shows that the relative performance of some funding strategies flips depending on that parameter’s value (ibid., p. 20, Fig. 6).
As a less convincing example, consider again Pluchino et al. (2010), which unfortunately I’ve found to be more typical among the models I examined. They do provide context on their parameters and assumptions, for example when saying that their “degree of competence” variable “includes all the features (efficiency, productivity, care, diligence, ability to acquire new skills) characterizing the average performance of an agent in a given position at a given level” (ibid., p. 468). However, aside from their discussion being generally less extensive and not covering all of their assumptions and parameters, I believe their discussion is unconvincing at the most crucial point.
This problem concerns what they call the Peter hypothesis – the assumption that an employee’s current performance is completely uninformative about his performance after a promotion. As I mentioned earlier, they find that their advertised result – i.e., random promotions outperforming a scheme that promotes the best eligible employee – depends on that hypothesis, and reverts under an alternative assumption. However, their only discussion of the Peter hypothesis is the following:
“Actually, the simple observation that a new position in the organization requires different work skills for effectively performing the new task (often completely different from the previous one), could suggest that the competence of a member at the new level could not be correlated to that at the old one.” (Pluchino et al., 2010, p. 467)
Similarly, in a later publication that extends their model in an unrelated way, Sobkowicz (2010) writes that:
“[Assuming that performance after a promotion is independent from pre-promotion performance is] suitable in situations where the new post calls for totally different set of skills (salesman promoted to sales manager or to marketing manager position).”
However, these arguments seem to show at most that past performance is not a perfect predictor of future performance. But assuming no relation whatsoever seems clearly excessive, for the following two reasons. First, there’ll usually at least be some overlap between tasks before and after a promotion. Second, performance correlates across different cognitive tasks, captured by psychological constructs such as general intelligence that differ between individuals and aren’t affected by promotions. (The absence in their models of such relevant interpersonal differences isn’t discussed at all.)
Missing or unconvincing ablation and sensitivity studies. Even without a full theoretical understanding, additional experiments can help illuminate a model’s behavior. In particular: When there are several assumptions or rules controlling the model’s dynamics, what happens if we remove them one-by-one (ablation studies)? And to what extent are results sensitive to the values of the input parameters?
Again, the problem I’ve found was not the complete absence of such experiments, but that they often failed to address the most relevant points. Consider again Pluchino et al. (2010). While they don’t provide details, they at least assure us “that the numerical results that we found for such an organization are very robust and show only a little dependence on the number of levels or on the number of agents per level (as long as it decreases going from the bottom to the top)” (ibid., p. 468), and “that all the results presented do not depend drastically on small changes in the value of the free parameters” (ibid., p. 469).
More importantly, they do show that their result crucially depends on the Peter hypothesis described above, and reverts under an alternative assumption. This alternative assumption instead assumes that current performance is an unbiased estimate of future performance, thus eliminating the regression to the mean effect I described above.
Pluchino et al. (2010) thus presented opposing results for two alternative assumptions, which both are implausibly extreme. It would therefore have been particularly important to investigate the model’s behavior under more realistic intermediate assumptions. Unfortunately, everything they do in that direction seems besides the point. Rather than considering intermediate assumptions, they discuss the case where it isn’t known which of two assumptions hold (ibid., p. 470); [19] mixing the strategies of promoting the best and worst employees (ibid., p. 470, Fig. 3); and in later work the case where some promotions are described by one and others by the other assumption (Pluchino et. al, 2011a, p. 3506, Fig. 10).
Instead, they experimentally investigate about every other possible variation in their model, particularly in their follow-up work (Pluchino et al., 2011a). For example, they change how often performance is updated relative to age, replace the simple pyramidal organizational structure from their 2010 paper with more complicated trees, vary the weights with which individual positions contribute to organizational performance, evaluate performance relative to the steady state under a baseline strategy rather than relative to the initial performance draw, and introduce age-dependent changes in performance. Of course, if my original analysis is correct, it’s not surprising that the relative performance of the random strategy is robust to all of these variations.
Again, Avin (2017) is the best example of doing this right (ibid., pp. 18-27, sc. 5). However, even here important open questions remain, such as whether the results depend on assuming a landscape with just two dimensions, an assumptions acknowledged to be problematic.
Lotteries are compared against suboptimal alternatives
The in my view most egregious problem with Pluchino and colleagues’ (2011a) follow-up work on promotions in hierarchical organizations is illustrative of yet another common issue. This is that lotteries are rarely shown to be optimal among a rich set of alternatives, let alone all possible strategies. Instead, they are compared to a small number of alternatives, which in some cases clearly don’t include the best one.
Recall that Pluchino et al. (2010) found that if pre-promotion performance is completely uninformative of post-promotion performance, then promoting the worst employees dramatically outperforms both promoting the best and random promotions. However, this winning strategy of promoting the worst is completely absent from their follow-up paper (Pluchino et al., 2011a). There they just compare random promotions with promoting the best, which as I argued earlier is the worst possible strategy given the Peter hypothesis. Put differently, they exhibit precisely those results which are least informative about the performance of random promotions. It is, for example, not surprising that no mixture of these two strategies outperforms completely random promotions (ibid., p. 3503, Fig. 7).
Similarly, in Figure 10 of Pluchino et al. (2018, p. 1850014-20) we see that in their simulations, at least for small amounts of total funding per round, giving the same small amount of funding to everyone worked even better than funding by lottery. Such egalitarian funding schemes also outperformed random ones at the task of distributing a fixed amount of funds (Pluchino et al., 2018, p. 1850014-23, Fig. 12).
Lastly, Avin (2017) evaluates lotteries only against strategies which are maximally short-sighted or explore as little as possible. It remains an open questions if one could construct an even better nonrandom strategy.
Unconvincing arguments and references
Especially in the publications by Biondo et al. and Pluchino et al., I encountered several other passages I didn’t find convincing. They were often sufficiently vague that it’s hard to conclusively demonstrate a mistake. They wouldn’t directly invalidate advertised findings about the performance of random strategies if false, but still make me more wary of taking such findings at face value.
***Misunderstandings and omissions in references to Sinatra and colleagues’ (2016) model of scientific careers***
Summary of Sinatra et al. (2016). In a paper published in Science, Sinatra et al. (2016) analyze the publication records of a large sample [20] of scientists. They find “two fundamental characteristics of a scientific career” (ibid., p. 596), both of which are prominently mentioned in their paper’s summary and abstract.
Their first result is a “random-impact rule” (ibid., p. 596). It says that the “highest-impact work can be, with the same probability, anywhere in the sequence of papers published by a scientist” (ibid., p. 596). As a measure of a publication’s impact, they use its number of citations after 10 years.
Note that this result concerns the distribution of impact within a given scientist’s career. In the context of funding decisions, we’d be more interested in differences between scientists.
Their second main result addresses just such differences. Specifically, their favored model contains a parameter Q, which they describe as the “sustained ability to publish high-impact papers” (p. 596). Crucially, Q is constant within each career, but differs between scientists. The higher a scientist’s Q the larger the expected value of their next paper’s impact. Similarly, if we compare scientists with the same number of publications then higher-Q individuals will likely have a larger total impact. [21]
Moreover, Sinatra et al. (2016, pp. aaf5239-4f.) describe how a scientist’s Q can be reliably measured early in their career. Overall, they conclude that “[b]y determining the value of Q during the early stages of a scientific career, we can use it to predict future career impact.” (ibid., p. aaf5239-5) [22]
Note that Sinatra et al. (2016, pp. aaf5239-2f.) did consider, but statistically reject, an alternative “random-impact model” without the parameter Q, i.e. assuming no interpersonal differences in abilities.
My takeaway. Sinatra et al. (2016) disentangle the roles of luck, productivity, and ability [23] in scientific careers (cf. ibid., p. aaf5239-6). They find that the role of luck is considerable but limited. If their analysis is sound, there are persistent differences between scientists’ abilities to publish much-cited papers, which can be reliably estimated based on their track records. If we were interested in maximizing citations, this might suggest to fund the scientists with highest estimated ability. In no way do their results favor funding by lottery. If anything, they suggest that we could find a reliable and valid ‘meritocratic’ strategy despite considerable noise in the available data.
References to Sinatra et al. (2016) elsewhere. I encountered several references to Sinatra et al. (2016) that only mentioned, with varying amounts of clarity, their finding that impact is randomly distributed within a career. These references were generally made to support claims around the large role of luck or the good performance of randomized strategies. Failing to mention the finding about differences in ability between scientists in this context strikes me as a relevant omission.
Consider for example:
“Scientific impact is randomly distributed, with high productivity alone having a limited effect on the likelihood of high-impact work in a scientific career.” (Beautiful Minds blog @ Scientific American)
“Actually, such conclusions [about diminishing marginal returns of research funding found by other papers] should not be a surprise in the light of the other recent finding [Sinatra et al., 2016] that impact, as measured by influential publications, is randomly distributed within a scientist’s temporal sequence of publications. In other words, if luck matters, and if it matters more than we are willing to admit, it is not strange that meritocratic strategies reveal less effective than expected, in particular if we try to evaluate merit ex-post.“ (Pluchino et al., 2018, p. 1850014-18)
The first quote is from a list of findings taken to support the claim that “we miss out on a really importance [sic] piece of the success picture if we only focus on personal characteristics in attempting to understand the determinants of success”. Now it is true that Sinatra et al. (2016) find the impact of a given scientist’s next work to be considerably influenced by luck. With respect to the “likelihood of high-impact work in a scientific career”, it is true that they find the effect of productivity to be limited. However, they also find that the aforementioned likelihood in fact is to a considerable extent determined by differences in “personal characteristics”, i.e. their parameter Q.
The second quote is from a discussion of whether it is “more effective to give large grants to a few apparently excellent researchers, or small grants to many more apparently ordinary researchers” (Pluchino et al., 2018, p. 1850014-18). However, I fail to see how Sinatra and colleagues’ results are related to findings about diminishing marginal returns of research funding at all. [24] Indeed, they didn’t consider any funding data.
***Sweeping generalizations and discussions that are vague or dubious***
Biondo and colleagues’ discussion of the Efficient Market Hypothesis. For example, I’m puzzled by Biondo and colleagues’ (2013a) discussion of the Efficient Market Hypothesis:
“We can roughly say that two main reference models of expectations have been widely established within economic theory: the adaptive expectations model and the rational expectation model. [...]
Whereas, the rational expectations approach [...] assumes that agents know exactly the entire model describing the economic system and, since they are endowed by perfect information, their forecast for any variable coincides with the objective prediction provided by theory. [...]
The so-called Efficient Market Hypothesis, which refers to the rational expectation Models [...]
Rational expectations theorists would immediately bet that the random strategy will easily loose [sic] the competition [...]” (Biondo et al., 2013a, pp. 608f.)
“Thus, it is theoretically consequent that, if the Efficient Markets Hypothesis held, the financial markets would result complete, efficient and perfectly competitive. This implies that, in presence of complete information, randomness should play no role, since the Efficient Market Hypothesis would generate a perfect trading strategy, able to predict exactly the market values, embedding all the information about short and long positions worldwide.” (Biondo et al., 2013a, p. 615)
My understanding is that the trading strategy suggested by the Efficient Market Hypothesis precisely is the random strategy vindicated by Biondo and colleagues’ empirical analysis. It’s hard to be certain, but the above discussion seems to suggest they think the opposite.
A puzzling remark on large organizations. In the context of discussing psychological effects ignored by their model, Pluchino et al. (2011a, p. 3509) assert that “in a very big company it is very likely that the employees completely ignore the promotion strategies of their managers”. While they don’t seem to rely on that assertion in that discussion or anywhere else, it still seems bizarre to me, and I have no idea why they think this is the case.
Sweeping generalizations based on the superficial similarity that several findings involve randomness or luck. For example, Pluchino et al. (2011b, p. 3944.) claim that their finding about randomly selecting members of parliament “is in line with the positive role which random noise plays often in nature and in particular in physical systems [...]. On the other hand, it goes also in the same direction of the recent discovery [...] that, under certain conditions, the adoption of random promotion strategies improves the efficiency of human hierarchical organizations [...].”
Biondo et al. (2013a, p. 607) are even more far-reaching in their introduction:
“In fact there are many examples where randomness has been proven to be extremely useful and beneficial. The use of random numbers in science is very well known and Monte Carlo methods are very much used since long time [...].”
While none of these claims is unambiguously false, I find their relevance dubious. Both the criteria according to which we make judgments such as “useful and beneficial” and the role played by randomness seem to vary drastically between the examples appealed to here.
In particular, it doesn’t seem to me that the good performance of randomization in Pluchino et al. (2011b) is at all related to their previous work on promotions in hierarchical organizations, which uses a very different model. Indeed, I’ve argued that the key explanation for Pluchino and colleagues’ (2010, 2011a) results on promotions simply is regression to the mean; by contrast, I don’t think regression to the mean plays a role in Pluchino et al. (2011b). Similarly, I suspect that Biondo and colleagues’ (2013a, 2013b) results are explained by the efficiency of financial markets, while a similar explanation isn’t applicable to these other cases.
How my negative conclusions follow
I earlier said that the literature:
- Doesn’t by itself establish that a beneficial potential of lotteries is common, or that the beneficial effect would be large for some particular EA funding decision.
- Doesn’t provide a method to determine the performance of lotteries that would be easily applicable to any specific EA funding decision.
- Overall suggests that the case for lotteries is strongest in situations that are most similar to institutional science funding. However, even in such cases it remains unclear whether lotteries are strictly optimal.
So far I’ve set aside work that doesn’t actually vindicate lotteries, or does so only according to criteria other than maximizing the expected value of some quantity of interest prior to a decision. While some relevant claims remain, they often rely on agent-based models, which I’ve argued aren’t by themselves strong evidence about the performance of lotteries in any real-world situation. In any case, they at most provide reasons to think that lotteries do better than the specific alternatives they are compared against, not that lotteries are optimal among all decision strategies. Finally, some publications contain dubious claims that might warrant caution against taking their results at face value.
In summary, I think that for most results I’ve seen it’s unclear whether they apply in any real-world situation, or even how one would go about verifying that they do. In particular, we cannot conclude anything about actual funding decisions faced by EAs, hence conclusions 1. and 2. follow.
Conclusion 3. is based on me overall being somewhat sympathetic to Avin’s (2015, 2017, 2018) case for funding science by lottery. I’ve explained above why I believe this work succeeds at overcoming the problems affecting agent-based models to a greater extent. However, my impression is mostly based on Avin also utilizing several other types of evidence that demonstrate shortcomings of the current grant peer review system (see especially Avin, 2018, pp. 2-8, sc. 2).
Positive conclusions that remain
I believe the literature I reviewed still supports the conclusion that:
- It’s conceptually possible that deciding by lottery can have a strictly larger expected ex-ante value than deciding by some nonrandom strategies, even when the latter aren’t assumed to be obviously bad or to have high cost.
Consider for example Pluchino and colleagues’ (2010, 2011a) work on promotions in hierarchical organizations, which I harshly criticized above. My criticism can at most show that their results won’t apply to any real-world organization. However, their model still shows that it’s possible – even if perhaps only under unrealistic assumptions – that random promotions can outperform the nonrandom strategy of promoting the best employees. If my analysis is correct, the latter strategy is bad – indeed, the worst possible strategy given their assumptions –, but it might not qualify as “obviously bad” (even given their assumptions). This is even more clearly the case for Avin’s (2017) model of science funding, which I’ve argued also is less affected by other problems I listed; he tests lotteries against strategies that seem reasonable, and indeed outperform lotteries for some parameter values.
None of these results are due to lotteries being assumed to be less costly. Indeed, none of the models I’ve seen attach any cost to any decision procedure.
I’ve also said that the literature:
- Provides some qualitative suggestions for conditions under which lotteries might have this effect.
In brief, it seems that the good performance of lotteries in the models I’ve seen is due to one of:
- The available options being equally good [25] in expectation (Pluchino et al., 2010, 2011a; Biondo et al., 2013a, 2013b), perhaps because they have been selected for this by an efficient market.
- Lotteries ensuring more exploration than some alternatives, thus providing a better solution to the exploration vs. exploitation trade-off (Avin, 2017).
- The relation between our current value estimates and the true value of the eventual outcome being affected by dynamic effects that are hard to take into account at decision time (Avin, 2017).
Additional empirical evidence – as opposed to simulation results – also suggests that:
- Lotteries can prevent biases such as anti-novelty bias (Avin, 2018, p. 11, fn. 10), which might negatively affect some alternatives.
- Lotteries ensure a spread of funding, which can be better than more concentrated funding due to diminishing marginal returns (Fortin and Currie, 2013; Mongeon et al., 2016; Wahls, 2018)
- Lotteries can be less costly for both decision makers and grantees (Avin, 2018, pp. 6-8, sc. 2.2).
- Lotteries may be a viable compromise when stakeholders disagree about how a decision should be made (Elster, 1989, p. 109).
My overall take on allocating EA funding by lottery
Summary. I’m highly uncertain whether one could improve, say, EA Grants or Open Phil’s grantmaking by introducing some random element. However, I’m now reasonably confident that the way to investigate this would be “bottom-up” rather than “top-down” – i.e. examining the specifics of a particular use case rather than asking when lotteries can be optimal under idealized conditions. Specifically, lotteries may be optimal in cases where explicitly accounting for the influence on an individual decisions on future decisions is prohibitively costly.
The general case for why lotteries might be optimal in practice. [26] When one decision affects others, optimizing each decision in isolation may not lead to the overall best collection of decisions. It can be best to make an individually suboptimal decision when this enables better decisions in the future. Lotteries turn out to sometimes have this effect, for example because they explore more and thus provide more information for future decisions.
Idealized agents with access to all relevant information may be able to explicitly account for and optimize effects on other decisions. They may thus find a fine-tuned nonrandom strategy that beats a lottery. For example, rather than exploring randomly, they could calculate just when and how to explore to maximize the value of information.
However, this explicit accounting may be prohibitively costly in practice, and required data may not be available. For example, it’s impossible to fully account for diminishing marginal returns of funding without knowing the decisions of other funders. In addition, even when information is available its use may be afflicted by common biases. For these reasons, it may in practice be infeasible to improve on a lottery.
General implications for the use of lotteries. Decision-makers should consider lotteries when they make a collection of related decisions where the beneficial effects of lotteries are relevant. A good strategy may then be to:
- Assess how an individual decision influences future decisions.
- Estimate how costly it’d be to explicitly account for these influences.
- Based on this, try to improve on lotteries, and to use them if and only if these efforts fail or prove too costly.
This will depend on the specifics of a use case.
Avin (personal communication) has suggested that we are less likely to beat a lottery if at the time of making a decision:
- We don’t know the decisions by other relevant actors.
- We anticipate long feedback loops.
- We face disagreement or deep uncertainty about how to evaluate our options.
Implications for the use of lotteries in EA funding. As I explained, my literature review leaves open the question if a lottery would improve any particular EA funding decision. This has to be assessed on a case-by-case basis, but my review doesn’t provide much guidance for this. Instead, I suggest following the strategy outlined above for the general case.
When tempted by a lottery, consider alternatives
Lotteries are an easily usable benchmark against which other decision strategies can be tested, either in a real-world experiment or a simulation.
What if a lottery outperforms a contender? The best reaction may be to understand why, and to use this insight to construct an even better third strategy.
In this way, lotteries could be used as a tool for analyzing and improving decisions, even in cases where they aren’t the best decision procedure, all things considered.
As an example, we might realize that one advantage of science funding by lotteries is to avoid cost for applicants. Sometimes there may be decision mechanisms that have the same advantage without using a lottery. For instance, Open Phil’s ‘Second Chance’ Program considered applications that had already been written for another program, and thus imposed almost zero marginal cost on applicants. [27]
Similarly, there may be other ways to ensure exploration, reduce cost, or reap the other benefits of lotteries I listed above.
However, in view of the obstacles mentioned earlier, it may in practice not be possible to improve on a lottery.
Avenues for further research
Based on this project, I see four avenues for further research. While I’m not very excited about any of them, it’s plausible to me that it might be worth for someone to pursue them.
- Research inspired by Denrell and Liu (2012, Model 2):
- Are there EA situations – not necessarily related to funding – where our estimates’ reliability varies a lot between options?
- This could for example be the case when comparing work across cause areas, or synthesizing different types of evidence.
- There would then be reasons not to choose the option with highest estimated value, as demonstrated by Denrell and Liu.
- However, this is a relatively easy-to-find extension of the optimizer’s curse (Smith and Winkler, 2006), and also broadly related to several blog posts by Holden Karnofsky and others such as this one, all of which have received significant attention in the EA community. I’d therefore guess that most easily applicable benefits from being aware of such effects have already been reaped.
- Investigate to what extent Avin’s (2015, 2017, 2018) case for funding science by lottery applies to those EA funding decisions which are most similar to institutional science funding, i.e. perhaps Open Phil’s science grants.
- Note that Open Phil’s share of funding may in some cause areas be comparable to the one of large institutional science funders for science.
- In principle, the dynamic effects described by Avin (2017) also affect science funded by Open Phil.
- I expect doing this well would require access to data specific to the funding situation, which might not be available.
- I’d guess this would be more likely to identify specific improvements to current funding strategies than to actually recommend a lottery.
- Pick a specific EA funding decision (e.g. EA Grants or donations by an individual) for which relevant information is available and assess how hard the obstacles to explicitly accounting for indirect effects are.
- Adapt Avin’s (2017) epistemic landscape model to EA funding:
- Potential of negative impact.
- Heavy-tailed impacts.
- New dynamic effect representing crucial considerations that can dramatically change impact estimates including flipping their sign.
- I started introducing some of these effects and ran some preliminary experiments. The code is available on request, but starting from Avin’s version may be better since I made only few and poorly commented changes.
- One way to empirically anchor such a model would be to look at how GiveWell’s cost-effectiveness estimates have changed over time.
- Among the three research directions mentioned here, I’m least excited about this one. This is because I think such a model would provide limited practical guidance by itself, for reasons similar to the ones discussed above.
Acknowledgements
I did this work as part of a 6-week Summer Research Fellowship at the Centre for Effective Altruism. (However, I spent only the equivalent of ~3 weeks on this project as I was working on another one in parallel.) I thank Shahar Avin for a helpful conversation at an early stage of this project and feedback on this post, as well as Sam Clarke, Max Dalton, and Johannes Treutlein for comments on notes that served as a starting point for writing this post.
References
Avin, S., 2015. Funding science by lottery. In Recent Developments in the Philosophy of Science: EPSA13 Helsinki (pp. 111-126). Springer, Cham.
Avin, S., 2017. Centralized Funding and Epistemic Exploration. The British Journal for the Philosophy of Science.
Avin, S., 2018. Policy Considerations for Random Allocation of Research Funds. In RT. A Journal on Research Policy and Evaluation, 6(1).
Barnett, W.P., 2008. The red queen among organizations: How competitiveness evolves. Princeton University Press.
Biondo, A.E., Pluchino, A. and Rapisarda, A., 2013a. The beneficial role of random strategies in social and financial systems. Journal of Statistical Physics, 151(3-4), pp.607-622. [arXiv preprint]
Biondo, A.E., Pluchino, A., Rapisarda, A. and Helbing, D., 2013b. Are random trading strategies more successful than technical ones?. PloS one, 8(7), p.e68344.
Boyce, J.R., 1994. Allocation of goods by lottery. Economic inquiry, 32(3), pp.457-476.
Boyle, C., 1998. Organizations selecting people: how the process could be made fairer by the appropriate use of lotteries. Journal of the Royal Statistical Society: Series D (The Statistician), 47(2), pp.291-321.
Denrell, J. and Liu, C., 2012. Top performers are not the most impressive when extreme performance indicates unreliability. Proceedings of the National Academy of Sciences, 109(24), pp.9331-9336.
Denrell, J., Fang, C. and Liu, C., 2014. Perspective—Chance explanations in the management sciences. Organization Science, 26(3), pp.923-940.
Elster, J., 1989. Solomonic judgements: Studies in the limitation of rationality. Cambridge University Press.
Fortin, J.M. and Currie, D.J., 2013. Big science vs. little science: how scientific impact scales with funding. PloS one, 8(6), p.e65263.
Frank, R.H., 2016. Success and luck: Good fortune and the myth of meritocracy. Princeton University Press.
Grim, P., 2009, November. Threshold Phenomena in Epistemic Networks. In AAAI Fall Symposium: Complex Adaptive Systems and the Threshold Effect (pp. 53-60).
Harnagel, A., 2018. A Mid-Level Approach to Modeling Scientific Communities. Studies in History and Philosophy of Science (forthcoming). [Preprint]
Hofstee, W.K., 1990. Allocation by lot: a conceptual and empirical analysis. Information (International Social Science Council), 29(4), pp.745-763.
Liu, C. and De Rond, M., 2016. Good night, and good luck: perspectives on luck in management scholarship. The Academy of Management Annals, 10(1), pp.409-451.
Martin, T., Hofman, J.M., Sharma, A., Anderson, A. and Watts, D.J., 2016, April. Exploring limits to prediction in complex social systems. In Proceedings of the 25th International Conference on World Wide Web (pp. 683-694). International World Wide Web Conferences Steering Committee. [arXiv preprint]
Mauboussin, M.J., 2012. The success equation: Untangling skill and luck in business, sports, and investing. Harvard Business Press.
Merton, R.K., 1968. The Matthew effect in science: The reward and communication systems of science are considered. Science, 159(3810), pp.56-63.
Mongeon, P., Brodeur, C., Beaudry, C. and Larivière, V., 2016. Concentration of research funding leads to decreasing marginal returns. Research Evaluation, 25(4), pp.396-404. [arXiv preprint]
Neurath, O.I., 1913. Die Verirrten des Cartesius und das Auxiliarmotiv.(Zur Psychologie des Entschlusses.) Vortrag. Barth.
Phelan, S.E. and Lin, Z., 2001. Promotion systems and organizational performance: A contingency model. Computational & Mathematical Organization Theory, 7(3), pp.207-232.
Pluchino, A., Rapisarda, A. and Garofalo, C., 2010. The Peter principle revisited: A computational study. Physica A: Statistical Mechanics and its Applications, 389(3), pp.467-472. [arXiv preprint]
Pluchino, A., Rapisarda, A. and Garofalo, C., 2011a. Efficient promotion strategies in hierarchical organizations. Physica A: Statistical Mechanics and its Applications, 390(20), pp.3496-3511.
Pluchino, A., Garofalo, C., Rapisarda, A., Spagano, S. and Caserta, M., 2011b. Accidental politicians: How randomly selected legislators can improve parliament efficiency. Physica A: Statistical Mechanics and Its Applications, 390(21-22), pp.3944-3954.
Pluchino, A., Biondo, A.E. and Rapisarda, A., 2018. Talent Versus Luck: The Role Of Randomness In Success And Failure. Advances in Complex Systems, 21(03n04), p.1850014. [arXiv preprint]
Simonton, D.K., 2004. Creativity in science: Chance, logic, genius, and zeitgeist. Cambridge University Press.
Sinatra, R., Wang, D., Deville, P., Song, C. and Barabási, A.L., 2016. Quantifying the evolution of individual scientific impact. Science, 354(6312), p.aaf5239. [ungated PDF]
Smith, J.E. and Winkler, R.L., 2006. The optimizer’s curse: Skepticism and postdecision surprise in decision analysis. Management Science, 52(3), pp.311-322.
Sobkowicz, P., 2010. Dilbert-Peter model of organization effectiveness: computer simulations. arXiv preprint arXiv:1001.4235.
Thorngate, W. and Carroll, B., 1987. Why the best person rarely wins: Some embarrassing facts about contests. Simulation & Games, 18(3), pp.299-320.
Thorngate, W., 1988. On the evolution of adjudicated contests and the principle of invidious. Journal of Behavioral Decision Making, 1(1), pp.5-15.
Wahls, W.P., 2018. The NIH must reduce disparities in funding to maximize its return on investments from taxpayers. eLife, 23(7)
Weisberg, M. and Muldoon, R., 2009. Epistemic landscapes and the division of cognitive labor. Philosophy of science, 76(2), pp.225-252.
Endnotes
[1] In particular, I didn’t think about game-theoretic reasons for making random decisions. For example, it is well known that in some games such as Matching pennies the only Nash equilibria are in mixed strategies.
[2] I don’t count strategies as random just because they use noisy estimators. For example, suppose you make a hiring decision based on performance in a work test. You might find that there is some variation in the performance in the test, even when taken by the same person. This might appear as random variation, and I might model performance as a random variable. Cf. Avin (2018, p. 11): “The potential error in the test in fact serves as a kind of lottery, which operates on top of the main function of the test, which is to predict performance.” However, I don’t refer to decisions using noisy estimators as lotteries or random, unless they in addition use deliberate randomization.
[3] I didn’t consult the primary source.
[4] In the only example I looked at, listed on the website as “EU decides where Agencies located to replace London(181)”, my impression from the news story quotes Boyle provides is that a lottery was merely used to break a voting tie.
[5] One reason why I didn’t do so was merely that I couldn’t easily get access to Mauboussin (2012) and Simonton (2004), while I became aware of Elster (1989) only shortly before the end of this project.
[6] Making this statement more precise and defending it is beyond the scope of this post. Denrell and Liu (2012) show that choosing the winning option can have suboptimal ex-ante expected value when the reliability of measurements varies too much; see pp. 1f. of their “Supporting Information” for a highly relevant discussion of a “monotone likelihood ratio property”, and in particular their conclusion that “our result cannot happen when the noise term is normally distributed [...] and our result could happen for some fat-tailed noise distributions”. EDIT: The number of available option is also relevant, see this comment by Flodorner.
[7] Note that Boyce finds that many lotteries in practice use discriminatory participation fees or other at first glance unfair mechanisms. He therefore rejects fairness as an explanation for the use of lotteries, and instead appeals to the self-interest of the lottery’s primary user group.
[8] Cf. also Pluchino et al. (2011b, p. 3953): “On the contrary the process of elections by vote can be subject to manipulation by money and other powerful means.“
[9] “We might think that physical ability, which is an easily measured factor, is the only relevant criterion in the selection for military service and yet use a lottery to reduce the incentive for self-mutilation.” (Elster 1989, p. 110). And: “[R]andomizing prevents recipients of scarce resources from trying to make themselves more eligible, at cost to themselves or society.” (Elster, 1989, p. 111)
[10] However, note that the incentive effects of lotteries need not be desirable. For example, promoting staff by lottery may on one hand decrease wasteful self-promotion, but on the other hand remove incentives to do a good job. The first of these effects has been explored by Sobkowicz (2010, sc. 2.8ff.) in a model otherwise similar to Pluchino and colleagues’ (2010). The second effect is implicit in Pluchino and colleagues’ (2011a, p. 3509) recommendation “to distinguish promotions from rewards and incentives for the good work done”.
[11] More precisely, their Figures 10 to 12 use an appropriately normalized version of the percentage mentioned in the main text. See their definition of E_{norm} for details (Pluchino et al., 2018, pp. 1850014-20f..).
[12] While I’m relatively confident in my analysis of Pluchino et al. (2010, 2011a), I haven’t done my own simulations or proofs to conclusively confirm it.
[13] The assumption described in the main text is referred to as “Peter hypothesis” by Pluchino et al. (2010, p. 468). They also investigate an alternative “common sense hypothesis”, under which random promotions no longer outperform the strategy of promoting the best eligible employees (ibid., p. 469, Fig. 2).
[14] As an aside, the work of Pluchino et al. (2010, 2011a) on promotions in hierarchical organizations also generated significant media attention. Pluchino et al. (2011a) point out that their 2010 paper “was quoted by several blogs and specialized newspapers, among which the MIT blog, the New York Times and the Financial Times, and it was also awarded the IG Nobel prize 2010 for ‘Management’”. All articles I’ve checked were brief and uncritical of Pluchino and colleagues’ results.
[15] For example, Pluchino et al. (2011b) consider the location of legislators in a 2-dimensional parameter space from Carlo M. Cipolla’s work on stupidity. ‘Epistemic landscape’ models of scientific activity as used by Avin (2015, 2017) and some previous authors (e.g. Grim, 2009; Weisberg and Muldoon, 2009) have been inspired by fitness models from evolutionary biology (Avin, 2015, pp. 78-86, sc. 3.3).
[16] For example, Pluchino et al. (2018) refer to the heavy-tailed distribution of wealth despite a less skewed distribution of inputs such as intelligence or work hours. Their model reproduces a similar effect. However, they provide no correspondence between any of the variables in their model and a particular measurable real-world quantity. They therefore cannot provide any quantitative validation, such as for example checking whether their model reproduces the correct amount of heavy-tailedness relative to a given set of inputs. Similarly, Pluchino et al. (2011a, p. 3510) refer to a management strategy emphasizing task rotation at Brazilian company SEMCO, but provide no details on how that strategy is similar to their random promotions. Nor do they explain on what basis they say that task rotation at SEMCO was “applied successfully”, other than by calling SEMCO’s CEO a “guru”, pointing out he gives lectures at the Harvard Business School, and saying that SEMCO grew from 90 to 3000 employees, a metric very different from the one used in their model.
[17] One limitation is that I consulted some, but not all, references in Avin (2017) to check whether epistemic landscape models have been empirically validated in previous publications.
[18] I am indebted to Shahar Avin for pointing me to this reference.
[19] Unfortunately, their recommendation for what to do when it isn’t known which of their two assumption applies also is unconvincing. This is because they don’t consider different degrees of belief in those assumptions, and instead present a recommendation that is only correct if one’s credence is divided about 50-50. What one should actually do, given their results, is to either always promote the best or the worst employees, depending on whether one has less or more than 50% credence in their Peter hypothesis.
[20] Most of their reported results are based on the publications of 2,887 physicists in the Physical Review journal family (Sinatra et al., 2016, p. aaf5239-7). This sample was selected from a much larger dataset of 236,884 physicists according to criteria such as minimum career length and publication frequency. Their chosen sample may raise two worries. First, patterns of citations within one family of physics journals might not generalize to the patterns of all citations in physics, let alone to other disciplines. Second, their results may not be robust to changing the criteria for selecting the sample from the larger data set; e.g., what if we select all physicists with careers spanning at least 10 rather than 20 years? (However, note that most physicists in the full data set have short careers with few publications, and that therefore their sample excludes a larger fraction of researchers than of publications.) Sinatra et al. address these and other worries by replicating their main results for different sampling criteria, and for a different data set covering more scientific disciplines; see in particular their Supplementary Materials. I haven’t investigated whether their attempts to rebut these worries are convincing.
[21] More precisely, they assume that scientist i publishes a total number of N_i papers, with each paper’s impact being determined by independent draws from Q_i * p, with random variation in p being the same for all scientists. They then use maximum-likelihood estimation to fit a trivariate log-normal distribution in N_i, Q_i, p to their data. This results in a model where Q_i is independent of p (‘luck’) and only weakly correlated with N_i (‘productivity’). See Sinatra et al. (2016, pp. aaf5239-3f., sc. “Q-model”) for details.
[22] However, note that Sinatra et al. (2016) always look at the number of citations 10 years after a publication. “[E]arly stages of a scientific career” here must therefore mean at least ten years after the first few publications.
[23] We must be careful no to prematurely identify the parameter Q with any specific conception of ability. Based on Sinatra and colleagues’ results, Q could be anything that varies between scientists but is constant within each career. It is perhaps more plausible that Q depends on, say, IQ than on height, but their analysis provides no specific reason to think that it does. Cf. also ibid., pp. aaf5239-6f.
[24] There is one result in Sinatra et al. (2016) which might at first glance – mistakenly – be taken to explain diminishing marginal returns of research funding, but it is not related to randomness. This is their finding that a scientist is more likely to publish their highest-impact paper within 20 years after their first publication (ibid., p. aaf5239-3, Fig. 2D). One might suspect this is because productivity or ability decrease over time, and worry that funding based on past successes would therefore select for scientists with an already diminished potential for impact. However, Sinatra et al. in fact find that productivity increases within a career, and that ability is constant. Their finding about the highest-impact work’s timing is merely an effect of few careers being long, with the location of a cut-off point after 20 years being an artefact of their sampling criteria (see Supplementary Materials, p. 42, Fig. S12).
[25] The literature disagrees on whether there ever are situations where all available options are exactly of equal value. Elster (1989, p. 54) provides the example of “the choice between identical cans of Campbell’s tomato soup”. Hofstee (1990, p. 746) objects that “[i]n practice, one would take the closest one and inspect its ultimate consumption date” and claims that “[f]or practical purposes, however, strict equioptimality is non-existent.”
[26] I am indebted to Avin (personal communication) for clearly expressing these points.
[27] Open Phil’s ‘Second Chance’ Program here just serves as an illustrative example of how a nonrandom strategy can have one of the same advantages of lotteries. I’m otherwise not familiar with the program, and in particular don’t claim that consideration of lotteries informed the design of this program or that avoiding cost for applicants was an important consideration.
Upvoted for posting original research. However, I'd strongly recommend posting a summary, with a link to the full text elsewhere, rather than posting the full text directly.
A post this long is likely to generate fewer comments (even if people might have been able to come up with good questions or ideas after just reading a summary), and it may be hard for a reader to judge whether they should invest in reading the whole thing (a summary can help someone make that judgment).
Rather than posting my thesis directly to my blog, I wrote a summary, and that wound up working out very well for me (I've had many readers whose work involved my thesis topic tell me the summary was useful to them, even though few of those read the full paper).
Hi Aaron, thank you for the suggestion. I agree that posting a more extensive summary would help readers decide if they should read the whole thing, and I will strongly consider doing so in case I ever plan to post similar things. For this specific post, I probably won't add a summary because my guess is that in this specific case the size of the beneficial effect doesn't justify the cost. (I do think extremely few people would use their time optimally by reading the post, mostly because it has no action-guiding conclusions and a low density of generally applicable insights.) I'm somewhat concerned that more people read this post than would be optimal just because there's some psychological pull toward reading whatever you clicked on, and that I could reduce the amount of time spent suboptimally by having a shorter summary here, with accessing the full text requiring an additional click. However, my hunch is that this harmful effect is sufficiently small. (Also, the cost to me would be unusually high because I have a large ugh field around this project and would really like to avoid spending any more time on it.) But do let me know if you think replacing this text with a summary is clearly warranted, and thank you again for the suggestion!
I still think you should write it. This looks like an important bit of information, but not worth the read, and I estimate a summary would increase the amount of readers fivefold.
on the greaterwrong version of the EA forum, there's an automatically generated TOC. So that's an option for people who would strongly prefer a TOC
I have been feeling the siren song of agent-based models recently (I think it seems a natural move in a lot of cases, because we are actually modelling agents), but your criticisms of them reminded me that they often don't pay for their complexity in better predictions. It seems quite a general and useful point, and perhaps could be extracted to a standalone post, if you had the time and inclination.
I know it wasn't a major area of focus for you, but do you have a vague impression of when randomisation might be a big win purely by reducing costs of evaluation? One particular case where it might be useful is funders where disbursement is bottlenecked by evaluation capacity. Do you have any pointers for useful places to start research on the idea?
Not really I'm afraid. I'd expect that due to the risk of inadvertent negative impacts and large improvements from weeding out obviously suboptimal options a pure lottery will rarely be a good idea. How much effort to expend beyond weeding out clearly suboptimal options to me likely seems to depend on contextual information specific to the use case. I'm not sure how much there is to be said in general except for platitudes along the lines of "invest time into explicit evaluation until the marginal value of information has diminished sufficiently".
Very interesting!
In the your literature review you summarize the Smith and Winkler (2006) paper as "Prove that nonrandom, non-Bayesian decision strategies systematically overestimate the value of the selected option."
On first sight, this claim seems like it might be stronger than the claim i have taken away from the paper (which is similar to what you write later in the text): if your decision strategy is to just choose the option you (naively) expect to be best, you will systematically overestimate the value of the selected option.
If you think the first claim is implied by the second (or something in the paper i missed) in some sense, i'd love to learn about your arguments!
"In fact, I believe that choosing the winning option does maximize expected value if all measurements are unbiased and their reliability doesn’t vary too much."
I think you are basically right, but the amount of available options also plays a role here. If you consider a lot of non-optimal options, for which your measurements are only slightly noisier than for the best option, you can still systematically underselect the best option. (For example, simulations suggest that with 99 N(0,1.1) and 1 N(0.1,1) variables, the last one will only be maximal among the 100 only 0.7% of the time, despite having the highest expected value).
In this case, randomly taking one option would in fact have a higher expected value. (But it still seems very unclear, how one would identify similar situations in reality, even if they existed).
Some combination of moderately varying noise and lots of options seems like the most plausible condition, under which not taking the winning option might be better for some real world decisions.
On your first point: I agree that the paper just shows that, as you wrote, "if your decision strategy is to just choose the option you (naively) expect to be best, you will systematically overestimate the value of the selected option".
I also think that "just choose the option you (naively) expect to be best" is an example of a "nonrandom, non-Bayesian decision strategy". Now, the first sentence you quoted might reasonably be read to make the stronger claim that all nonrandom, non-Bayesian decision strategies have a certain property. However, the paper actually just shows that one of them does.
Is this what you were pointing to? If so, I'll edit the quoted sentence accordingly, but I first wanted to check if I understood you correctly.
In any case, thank you for your comment!
Yes, exactly. When first reading your summary i interpreted it as the "for all" claim.
Ok, thanks, I now say "Prove that a certain nonrandom, non-Bayesian ...".
On your second point: I think you're right, and that's a great example. I've added a link to your comment to the post.