Hide table of contents

The most important questions in forecasting tend to resist precise definitions — questions like, “Will we be ready for AGI when it arrives?” or “How are China’s AI capabilities trending overall?” These big-picture questions are more consequential than “How will Alibaba’s Qwen model perform on the Chatbot Arena,” but where would you even begin to operationalize a thing like AGI readiness in any rigorous way?

We’re experimenting with an idea that bypasses this issue, which we’re calling: Indexes.

An index takes a vague question, like “How ready will we be for AGI in 2030?” and gives a quantitative answer on a -100 to 100 scale. We get this quantitative answer by taking forecasts on a set of questions, identified by the index author (a person or group) as collectively pointing at a nebulous but important concept like AGI readiness, and combining them into a single dimensionless number. The index author assigns a weight to each question indicating a) how informative they are relative to one another and b) whether learning that the resolution is Yes or No should make the Index go up or down. What you see when you look at a Metaculus Index, then, is in some sense a forecast of what the Index will be when all the questions are resolved. When the index goes up, that means forecasters believe, as a whole, things are looking better for 2030 than they previously expected.

Enough theory. We’ve launched our flagship index, which you can forecast on now: AGI Readiness Index

Example AGI Readiness Index
These are the tentative questions and weights from an index we developed at The Curve. The data are fake — this is just to illustrate what an index will look like. The index value is the weighted average of its component questions.

The questions and weights in this figure are the preliminary results of a workshop we conducted at The Curve in November. We asked each participant to answer the prompt:

Assume AGI arrives in 2030. If you could ask an oracle three questions about the world right before AGI arrives, what would they be?

The rest of the workshop was spent specifying and winnowing the questions until finally we had a set of eight questions that were rated most informative by participants. There was, of course, disagreement — about how likely a question was to resolve positively (after all, if you think you know what the answer will be, that would be a waste of an oracle question), about how correlated questions were with each other (i.e. you want a set of questions that are as complementary / independent from each other as possible), and about what constituted readiness. Once we had all the questions on the table, participants had the opportunity to critique and defend each other’s questions, and finally assign weights. Each participant had ten points to distribute, at which point eight questions emerged as the key axes of readiness. Tied in first were legislation requiring compliance with RSPs, third-party auditing, and transparency; and a public list of incidents related to AI safety.

We hope you disagree with these weights! It would be a sorry index that received no push-back. What questions are missing? Our dream is for indexes to be a forum to talk about the big picture. Where do we see ourselves in 2030? Are we headed in the right direction? How would we know? At the time of writing, we have two indexes in the pipeline: AI for Public Good, in collaboration with AI Palace, and a China Capabilities Index, with the Simon Institute for Longterm Governance.

Do you have an idea for an index? Get in touch!

Credits

This project is partly inspired by the Forecasting Research Institute’s report Conditional Trees: A Method for Generating Informative Questions about Complex Topics, which presents a metric and method for identifying “high-value” forecasting questions with respect to an ultimate question. Our goal here is to curate these questions quickly, without putting the index author through the exercise of conditional forecasting and without necessarily having to strictly define the ultimate question. There’s nothing stopping the question weights in an index from being based on a quantitative metric like value of information, but they can also be less rigorous, or even completely "vibes-based."

We’ve also been heavily influenced by Cultivate Labs’ Issue Decomposition approach, used by the RAND Forecasting Initiative, in which top-level questions are broken down into drivers, sub-drivers, and finally forecasting questions aka "signals." Our indexes are one way to recompose forecasting questions into something that can inform strategy and give us a look at the overall trajectory.

23

0
0

Reactions

0
0

More posts like this

Comments1
Sorted by Click to highlight new comments since:

Executive summary: The concept of "Indexes" is introduced as a method to quantify vague yet crucial forecasting questions, such as AGI readiness, by aggregating weighted answers to a curated set of sub-questions, enabling actionable insights into nebulous topics. 

Key points:

  1. Indexes aim to operationalize vague, consequential questions (e.g., AGI readiness) by providing a numerical scale (-100 to 100) based on weighted forecasts of related sub-questions.
  2. Index construction involves selecting, specifying, and weighting sub-questions deemed informative by index authors, ensuring complementary and independent insights.
  3. A flagship example, the "AGI Readiness Index," uses eight key axes such as AI legislation, transparency, and incident reporting, derived from expert workshops.
  4. Indexes are intended to provoke discussion and critique, fostering collaboration to refine questions, weights, and perspectives.
  5. Upcoming indexes include "AI for Public Good" and "China Capabilities Index," aiming to broaden the scope of big-picture insights.
  6. Inspired by methodologies like the Forecasting Research Institute’s "Conditional Trees" and Cultivate Labs’ decomposition approach, Indexes balance rigor with practical, flexible implementation.

 

 

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

Curated and popular this week
Relevant opportunities