I would like to discuss an idea about how simple forecasting tournaments could be used as a public participation tool to 1) increase the transparency and accuracy of public funding, 2) increase citizen´s trust in governments, 3) improve the quality of political debate by building a database of forecasts and arguments, and 4) improve national foresight on emerging policy priorities in a medium to long-term horizon.
With a broad version of this idea in mind, I and my colleagues in Czech Priorities have managed to get two grants for applied research (Use of forecasting tournaments in policymaking and a Methodology for early identification of megatrends). The Czech public institutions are interested in the outcomes of both projects and they should provide useful data and practical insights on their own (see future posts). But this set-up also seems like a nice opportunity to maximize the exploration value by validating additional research questions, e.g. regarding the feasibility of the idea outlined below.
We can do minor adjustments to the experimental design of both projects until March 2021. So the main goal of this post is to find EAs/Rationalists, who would be interested in further exploring this idea with us in the next ca. 2 months, to better understand what evidence would be the most useful, pre-register questions and/or write a paper.
Short theory: Deliberation & Participation
Let's consider forecasting tournaments a method of public deliberation. A deliberation is a useful tool that may improve policymaking in various direct and indirect ways, but it is complicated to design the process robustly. Forecasting tournaments can be useful in structuring the process and providing better incentives to deliberate well, but it has its own problems. Nonetheless, there seems to be a large space for exploration.
Representativeness of the population is important for the legitimacy of deliberation, but even with sortition or mini-publics methods, it is difficult to create truly representative groups. Nationwide participation methods such as referenda or public budgeting, which are more likely to be representative, often encounter little interest or give suboptimal results, because citizens are neither incentivized to be honest nor to do their research to be right.
Proposals such as Democracy Dollars (each person gets $200/year to give to a party or a cause of their choice) improve the incentives (and I think they are likely to be trialed on a national level somewhere soon), but they have the same problem as public budgeting - citizens are not incentivized to think too hard about how they spend, so when applied to complex nationwide causes, it might increase public participation (an improvement over the status quo), but won´t otherwise produce much valuable info.
Proposal: Forecasting of Priorities
I suggest another mechanism, primarily for prioritization between precisely these complex societal causes. I´m describing here an ambitious version, where the organizer is a national government and participants are all eligible voters (though participation is voluntary). However, the mechanism should work on smaller samples too, and so it should be able to be scaled up from small groups all the way to large corporations and nations, where the benefits over the status quo would be, I think, substantive.
The core of the mechanism is a forecasting tool, where each citizen receives a virtual credit (let's say $200) each year (on their birthday, so that the participation is spread in time). After login, they see a long-list of 20-40 public causes and challenges (such as longevity, corruption, legalization of drugs, mental health, better roads, etc). Citizens can (anonymously) allocate credit anytime within a year, using quadratic voting, to any causes that they consider a priority, and explain why or specify it further in a public comment. They can use different strategies to do that, which I will describe below. Once they allocate the credit, the amount actually goes to solving the cause (i.e. funding research and implementation of the solutions) as funding by the government.
To motivate due diligence, the citizens receive financial rewards if they allocate the credit to priorities that turn out to be considered priorities by an expert panel 3 years later (3y seems to be the longest horizon for both forecasting and financial motivations to work). More specifically, a few % of participants with the best foresight (whose distributions of priorities in year Y was the closest to the distribution of priorities of an expert panel in year Y+3) receive a substantial financial reward (by the government, kind of like a bounty for having caused a positive impact 3 years ago). As a resolution (in time Y+3) a panel of experts simply gets asked, what are the most important priorities (probably in a real-time Delphi-like method, but more on that later).
As a hypothetical result, the government is happy because it effectively harnesses a lot of inputs about what to fund and only pays rewards to the most visionary inputs 3y later. Non-populist politicians are happy because they can tell their voters “your opinion matters” and now it's believable. The citizens are happy because they feel directly involved in policymaking and get educated in the process. NGOs hoping to improve public discourse are happy because there is a growing structured database of weighted arguments that get checked for accuracy 3 years later. Media are happy because now there is a constantly updating and easily understandable aggregate of what citizens actually think and want.
Two strategies
Now imagine, that you just received $200 credit to allocate to causes. Depending on what you want, you can decide on the spectrum between two main strategies, both of which should be beneficial for both you and the government. My intuition is, that around 70% of citizens in the western societies, who decide to participate, would go for the "Activist" strategy and 30% for the "Forecaster" strategy, but there is no evidence.
“Activist“ strategy
- Your selfish motivation is to get benefits for yourself. You think of what you, your family, or friends most need from the government to fund, and allocate credit to that. In a comment, you specify what exactly you´d prefer to be solved within the cause. You are instrumentally rational. That's great.
- You also want to solve today's problems. To your friends, you explain, that you are the kind of person that cares about problems that are here and now. They see it as a virtue. You get mainly social rewards. The government gets informed about your real needs.
“Forecaster” strategy
- Your selfish motivation is to win money and a forecaster status. You think of what your society as a whole needs most to invest in, and allocate credit to that. In a comment, you explain why this cause should be prioritized. You are epistemically rational. That's great.
- You want to solve tomorrow's problems. To your friends, you explain that you are the kind of person that cares for others and for the best possible future. They see it as a virtue. As a result, you get both social and financial rewards (if you are actually right). The government gets your insights about the future.
Note, that it is hard to discredit either of the two strategies as bad or stupid. The worst strategy would probably be to allocate all of your credit to whichever one cause, but quadratic voting makes sure that the impact of these strategies gets limited. Since you can comment only while allocating credit, you can write hoaxes to actively cause harm, but you actually have to fund (give some of your credit to) the causes that you want to harm.
It is desirable to provide a constructive critique, though. It is treated in the same way as endorsements (e.g. others could see how you are weighting those arguments by how much credit you allocate). You are just incentivized to provide critique on causes that you think are still important (since you know you fund them at the same time), you just maybe want to criticize the methods of solving them. It will be read by others and probably also by the organizations that received the funding, so it can have an impact.
Discussion: Fixed point & Self-fulfilling prophecy
I have many raw pages on other specifications (e.g. how to choose recipients of the funding), possible tweaks (e.g. a cheaper amplifying research tool instead of a Delphi as a resolution), other potential benefits (e.g. CSR or donor funds could match the best forecasters), possible failures (e.g. clientelism, fake recipients of the funding) and explanations, why this design should be robust to those failures (e.g. the causes are too broad for insider trading, citizens can´t decide who receives the funding or which solutions get implemented, $200/year is not enough incentive for bad cause advocates to run large campaigns all year).
In the remainder of this post, however, I would like to consider two predict-o-matic problems that could be important when we consider a nationwide application. I think they will turn out to be a feature rather than a bug, but tell me if this is wrong.
1) Fixed point problem
You are a rationalist using the Forecaster strategy. Your line of thought goes: "Well, it's obvious, that the cause X will be a priority 3y later, but many people already think the same -> they will give credit to it now -> it will get a lot of funding and get largely tackled in 3 years -> it will not be a priority. But this thinking would be wrong, if:
- The cause X is too big to be solved in 3 years (AI safety, climate change, fake news, wealth redistribution, animal suffering, air pollution, even a new highway, etc.)
- The amount of funding, even if prioritized heavily (and matched by external actors), is not large enough to cause the cause to be solved rapidly, and
- You think that many other people with similar priorities have the same line of thought and act on it, so NOT acting on it would be a better strategy.
The first two conditions should apply here - the causes need to be quite general for other reasons too (comprehensibility, keeping the debate on a level of values, not of the feasibility of solutions, etc.) and even if applied in many countries, the funding would not be sufficient to solve them in 3 years.
Regarding the third condition, the reasonability of this thinking depends on the assumed number of citizens in these four categories:
- Those who are wrong about the importance of X - not prioritize X
- Those who are right about X, but not have this line of thought - prioritize X
- Those who are right about X, have this line of thought, act on it - not prioritize X
- Those who are right about X, have this line of thought, think that many others act on it too, not act on it - prioritize X
This could be a problem with YES/NO answers or one-on-one elections, but in this mechanism, you will always prioritize between many various causes, so there should always be more people who don't prioritize X (1+3) than those who do (2+4). Then, the best strategy should always be to prioritize what you actually believe.
2) Self-fulfilling prophecy
You distribute your credits, but before submitting, you think about whether it makes sense to invest time into writing good explanations and arguments in comments.
If you do, you might help others to be good forecasters (especially if you already have a forecaster status), which lowers your chances to get rewarded 3y later. But the benefits are much higher (even for yourself), precisely because of the self-fulfilling prophecy.
Others (who use the forecaster strategy) will be persuaded by your arguments -> the cause will gain traction and get more funding -> more people will get involved -> more articles will be written -> experts in Delphi will prioritize it 3y later -> you win.
The beauty of this is that this should work especially with new, emerging causes, that take 3 years before they enter the Overton window and can be seriously considered by policy experts. Wouldn't it be great to have precisely these causes funded 3y earlier?
Next steps: Research questions
Starting March 2021, we will test, whether a diversified group of 150-250 people with financial motivations and the basics of forecasting (representing the citizens who would choose the “forecaster” strategy) are actually able to predict the top 5-10 megatrends / grand societal challenges (from a long-list of 20-40), that will be prioritized 3-5 years later by experts in a Delphi study.
The same Delphi will be done already in 2021 (by us) and then every 3-5 years (by whoever wins the Czech public tender, but using our methodology). With this in mind, I'm confident there are other research questions (with earlier resolution) that we could ask, but cooperation with people experienced in designing research would be useful.
Interested EAs/Rationalists, feel free to contact me (jan@ceskepriority.cz). Any other comments (especially on why similar mechanisms won't work) would also be great.
Substantive points
Wait, so citizens are incentivized to predict what experts will say? This seems a little bit weak, because experts can be arbitrarily removed from reality. You might think that, no, our experts have a great grasp of reality, but I'd intuitively be skeptical. As in, I don't really know that many people who have a good grasp of what the most pressing problems of the world are.
So in effect, if that's the case, then the key feedback loops of your system are the ones between experts using the Delphi system <> reality, and the loop between experts <> forecasters seems secondary. For example, if I'm asked what Eliezer Yudkowsky will say the world's top priority is in three years, I pretty much know that he's going to say "artificial intelligence", and if you ask me to predict what Greta Thunberg will say, I pretty much know that she's going to go with "climate change".
I think that eventually you'll need a cleverer system which has more contact with reality. I don't know how that system would look, though. Perhaps CSET has something to say here. In particular, they have a neat method of taking big picture questions and decomposing them into scenarios and then into smaller, more forecastable questions.
Anyways, despite this the first round seems like an interesting governance/forecasting experiment.
Also, 150-250 people seems like too little to get great forecasters. If you were optimizing for forecasting accuracy, you might be better off hiring a bunch of superforecasters.
Re: Predict-O-Matic problems, see some more here
Not allowing forecasters to suggest their own trends (maybe with some very cursory review) seems like an easy mistake to fix.
Nitpicks:
This may have the problem that once the public identifies a "leader", either a very good forecaster or a persuasive pundit, they can just copy their forecasts. As a result, this part:
seems like an overestimate; you wouldn't be harnessing that many inputs after all
This depends on how much of the budget is chosen this way. In the worst case scenario, this gives a veneer of respectability to a process which only lets citizens decide over a very small portion of the budget.
Yes, there are not many experts with this kind of grasp, but a DELPHI done by a diversified group of experts from various fields seems to be currently the best method for identifying megatrends (while some methods of text analysis, technological forecasting, or serious games can help). Only the expertise represented in the group will be known in advance, not the identity of experts.
"What are the top national/world priorities" is usually so complex, that it will remain to be a mostly subjective judgment. Then, how else would you resolve it than by looking for some kind of future consensus?
But I agree that even if the individual experts are not known, their biases could be predictable, especially if the pool of relevant local experts is small or there is a lot of academic inbreeding. This could be solved by lowering the bar for expertise (e.g. involving junior experts - Ph.D. students/postdocs in the same fields) so that each year, different experts participate in the resolution-DELPHI.
If the high cost and length of a resolution-DELPHI turns out to be a problem (I suppose so), those junior experts could just participate in a quick forecasting tournament on "what would senior experts say, if we run a DELPHI next month?", 1 out of 4 of these tournaments would be randomly followed by a DELPHI, while the rewards here would be 4x higher. But this adds a lot of complexity.
Thanks! We are in touch with CSET and I think their approach is super useful. Hopefully, we´ll be able to specify some more research questions together before we start the trials.
Yeah, that's a great point - if the leader is consistently a good forecaster and a lot of people (but probably not more than a couple of % of participants in case of a widespread adoption) copy him, there are fewer info inputs, but it has other benefits (a lot of people now feel ownership in the right causes, they gain traction etc.). There will also be influential "activists" that will get copied a lot (it's probably unrealistic to prevent everyone from revealing real-life identity if they want to), but since there is cash at stake and no direct social incentive (unlike with e.g. retweeting), I think most people will be more cautious about the priorities of the person they want to copy.
A small portion of the budget (e.g. 1%) would still be an improvement - most citizens would not think about how little of budget the allocate, but that they allocate not negligible $200, and they would feel like they actually participated in the whole political process, not only in 1% of it.
You could decompose that complex question into smaller questions which are more forecastable, and forecast those questions instead, in a similar way to what CSET is doing for geopolitical scenarios. For example:
This might require having infrastructure to create and answer large number of forecasting questions efficiently, and it will require having a good ontology of "priorities/mega-trends" (so most possible new priorities are included and forecasted), as well as a way to update that ontology.
Have you considered that you're trying to do too many things at the same time?
Possibly. Yes, it could be split between separate mechanisms 1) Public budgeting tool using quadratic voting for what I want govs to fund now, and 2) Forecasting tournament/prediction market for what will be the data/consensus about national priorities 3y later (without knowing forecasters´ prior performance, multiple-choice Surprising popularity approach could also be very relevant here). I see benefits in trying to merge these and wanted to put it out here, but yes, I'm totally in favor of more experimenting with these ideas separately, that's what we hope to do in our Megatrends project :)
Very cool idea, would love to see this implemented. Some thoughts while reading:
don’t they have plenty of that already and further pressures are actually negative if they think they know best?
FWIW I feel like I rarely observe non-populist politicians running on a direct democracy platform, and the one time I remember they weren’t successful (the German Piratenpartei).
Who are the experts? I expect this to cause controversy. Even without this mechanism I already perceive that on fairly clearcut questions there is a lot of controversy, also in less polarized countries relative to the US, like Germany (e.g. on nuclear energy, GMOs, rent control). Maybe this could be circumvented by letting the population decide it? Or at least their elected representatives? I‘ve stumbled upon the „Bayesian Truth Serum“ mechanism that might useful for eliciting non-measurable outcomes. https://www.overcomingbias.com/2017/01/surprising-popularity.html
Maybe this could be a useful use case of integrating an Elicit forecast like described here. https://forum.effectivealtruism.org/posts/YgbPSyTvft6EcKnWm/forum-update-new-features-december-2020#Elicit_predictions FWIW I’d expect even more „Activists“, more like 95% maybe? Only being able to predict which of ~20 categories some group of „experts“ will find most important feels demotivating to me, even though I do forecasts regularly. This might change with the amount of monetary compensation, though I expect this wouldn’t motivate too many people, maybe 1 in 100?
If the granularity is on the level of „ longevity, corruption, legalization of drugs, mental health, better roads, etc“ I think I don’t expect people to get hoaxy on those. I assume they’d get filtered beforehand to be kind of common-sensical and maybe even boring to think about. Speaking of granularity, I wonder if this would even be minimally enough to distinguish good from lucky forecasters, especially when there are so many participants.
ETA: Right, and congrats on getting the grants approved! Do they come from the Czech government?
Yes, but a lot of it seems to be the inputs from the lobby, interest groups, or people who are mostly virtue signaling to their peers, honest citizen participation (citizen assemblies etc.) is not that common... In this case, the government pre-commits to only allocate a small part of the budget accordingly, apart from that, politicians can still do what they think they know best.
Thanks for sharing! The Nature study that Robin Hanson talks about is pretty relevant. But in our mechanism, the participants are predicting expert consensus (not their own consensus), so we don't need to make it harder for them to coordinate their answers, we just have to make sure they don't know who the experts in DELPHI 3-5 years later will be, so that they can´t influence them.
Also, unlike in the Surprising Popularity mechanism, in case you are confident that only you and a few others know the truth, your incentive is not to keep going with the contemporary consensus but actually go with your contrarian opinion, especially when it is likely to become accepted 3-5 years later (and experts should more likely to accept it earlier than the majority of the public).
If "what do I support" becomes a socially useful topic to mention to your friends, this social incentive might be more important than the financial incentive for choosing the forecaster strategy. But probably you´re right, there would be less the 30% forecasters.
Right, we need to find the level of granularity between "boring to most" and "too difficult to most". I think there are already pretty good setups and scoring mechanisms to eliminate luck - like if you forecast in a probability distribution and are rewarded based on how much you have improved the current aggregate. But yes, this needs more research.
Yes, the Technological Agency of the Czech Republic.
Thanks, interesting points. Yes, predicting what unknown experts will conclude seems reasonable to me (though NunoSempere's points also seem sensible). Looking forward to read your next update!