Announcing the Future Fund's AI Worldview Prize

Nick_Beckstead; leopold; William_MacAskill; ketanrama; ab

This is a linkpost for https://ftxfuturefund.org/announcing-the-future-funds-ai-worldview-prize/

Update: I, Nick Beckstead, no longer work at the Future Fund am writing this update purely in a personal capacity. Since the Future Fund team has resigned and FTX has filed for bankruptcy, it now seems very unlikely that these prizes will be paid out. I'm very sad about the disruption that this may cause to contest participants.

I would encourage participants who were working on entries for this prize competition to save their work and submit it to Open Philanthropy's own AI Worldview Contest in 2023.

Today we are announcing a competition with prizes ranging from $15k to $1.5M for work that informs the Future Fund's fundamental assumptions about the future of AI, or is informative to a panel of superforecaster judges selected by Good Judgment Inc. These prizes will be open for three months—until Dec 23—after which we may change or discontinue them at our discretion. We have two reasons for launching these prizes.

First, we hope to expose our assumptions about the future of AI to intense external scrutiny and improve them. We think artificial intelligence (AI) is the development most likely to dramatically alter the trajectory of humanity this century, and it is consequently one of our top funding priorities. Yet our philanthropic interest in AI is fundamentally dependent on a number of very difficult judgment calls, which we think have been inadequately scrutinized by others.

As a result, we think it's really possible that:

all of this AI stuff is a misguided sideshow,
we should be even more focused on AI, or
a bunch of this AI stuff is basically right, but we should be focusing on entirely different aspects of the problem.

If any of those three options is right—and we strongly suspect at least one of them is—we want to learn about it as quickly as possible because it would change how we allocate hundreds of millions of dollars (or more) and help us better serve our mission of improving humanity's longterm prospects.

Second, we are aiming to do bold and decisive tests of prize-based philanthropy, as part of our more general aim of testing highly scalable approaches to funding. We think these prizes contribute to that work. If these prizes work, it will be a large update in favor of this approach being capable of surfacing valuable knowledge that could affect our prioritization. If they don't work, that could be an update against this approach surfacing such knowledge (depending how it plays out).

The rest of this post will:

Explain the beliefs that, if altered, would dramatically affect our approach to grantmaking
Describe the conditions under which our prizes will pay out
Describe in basic terms how we arrived at our beliefs and cover other clarifications

Prize conditions

On our areas of interest page, we introduce our core concerns about AI as follows:

We think artificial intelligence (AI) is the development most likely to dramatically alter the trajectory of humanity this century. AI is already posing serious challenges: transparency, interpretability, algorithmic bias, and robustness, to name just a few. Before too long, advanced AI could automate the process of scientific and technological discovery, leading to economic growth rates well over 10% per year (see Aghion et al 2017, this post, and Davidson 2021).
As a result, our world could soon look radically different. With the help of advanced AI, we could make enormous progress toward ending global poverty, animal suffering, early death and debilitating disease. But two formidable new problems for humanity could also arise:
Loss of control to AI systems
Advanced AI systems might acquire undesirable objectives and pursue power in unintended ways, causing humans to lose all or most of their influence over the future.
Concentration of power
Actors with an edge in advanced AI technology could acquire massive power and influence; if they misuse this technology, they could inflict lasting damage on humanity’s long-term future.
For more on these problems, we recommend Holden Karnofsky’s “Most Important Century,” Nick Bostrom’s Superintelligence, and Joseph Carlsmith’s “Is power-seeking AI an existential risk?”.

Here is a table identifying various questions about these scenarios that we believe are central, our current position on the question (for the sake of concreteness), and alternative positions that would significantly alter the Future Fund's thinking about the future of AI^[1]^[2]:

Proposition	Current position	Lower prize threshold	Upper prize threshold
“P(misalignment x-risk\|AGI)”: Conditional on AGI being developed by 2070, humanity will go extinct or drastically curtail its future potential due to loss of control of AGI	15%	7%	35%
AGI will be developed by January 1, 2043	20%	10%	45%
AGI will be developed by January 1, 2100	60%	30%	N/A

Future Fund will award a prize of $500k to anyone that publishes analysis that moves these probabilities to the lower or upper prize threshold.^[3] To qualify, please please publish your work (or publish a post linking to it) on the Effective Altruism Forum, the AI Alignment Forum, or LessWrong with a "Future Fund worldview prize" tag. You can also participate in the contest by publishing your submission somewhere else (e.g. arXiv or your blog) and filling out this submission form. We will then linkpost/crosspost to your submission on the EA Forum.

We will award larger prizes for larger changes to these probabilities, as follows:

$1.5M for moving “P(misalignment x-risk|AGI)” below 3% or above 75%
$1.5M for moving “AGI will be developed by January 1, 2043” below 3% or above 75%

We will award prizes of intermediate size for intermediate updates at our discretion.

We are also offering:

A $200k prize for publishing any significant original analysis^[4] which we consider the new canonical reference on any one of the above questions, even if it does not move our current position beyond a relevant threshold. Past works that would have qualified for this prize include: Yudkowsky 2008, Superintelligence, Cotra 2020, Carlsmith 2021, and Karnofsky's Most Important Century series. (While the above sources are lengthy, we'd prefer to offer a prize for a brief but persuasive argument.)
A $200k prize for publishing any analysis which we consider the canonical critique of the current position highlighted above on any of the above questions, even if it does not move our position beyond a relevant threshold. Past works that might have qualified for this prize include: Hanson 2011, Karnofsky 2012, and Garfinkel 2021.
At a minimum, we will award $50k to the three published analyses that most inform the Future Fund's overall perspective on these issues, and $15k for the next 3-10 most promising contributions to the prize competition. (I.e., we will award a minimum of 6 prizes. If some of the larger prizes are claimed, we may accordingly award fewer of these prizes.)

As a check/balance on our reasonableness as judges, a panel of superforecaster judges will independently review a subset of highly upvoted/nominated contest entries with the aim of identifying any contestant who did not receive a prize, but would have if the superforecasters were running the contest themselves (e.g., an entrant that sufficiently shifted the superforecasters’ credences).

For the $500k-$1.5M prizes, if the superforecasters think an entrant deserved a prize but we didn’t award one, we will award $200k (or more) for up to one entrant in each category (existential risk conditional on AGI by 2070, AGI by 2043, AGI by 2100), upon recommendation of the superforecaster judge panel.
For the $15k-200k prizes, if the superforecasters think an entrant deserved a prize but we didn’t award one, we will award additional prizes upon recommendation of the superforecaster judge panel.

The superforecaster judges will be selected by Good Judgment Inc. and will render their verdicts autonomously. While superforecasters have only been demonstrated to have superior prediction track records for shorter-term events, we think of them as a lay jury of smart, calibrated, impartial people.

Our hope is that potential applicants who are confident in the strength of their arguments, but skeptical of our ability to judge impartially, will nonetheless believe that the superforecaster jury will plausibly judge their arguments fairly. After all, entrants could reasonably doubt that people who have spent tens of millions of dollars funding this area would be willing to acknowledge it if that turned out to be a mistake.

Details and fine print

Only original work published after our prize is announced is eligible to win.
We do not plan to read everything written with the aim of claiming these prizes. We plan to rely in part on the judgment of other researchers and people we trust when deciding what to seriously engage with. We also do not plan to explain in individual cases why we did or did not engage seriously.
If you have questions about the prizes, please ask them as comments on this post. We do not plan to respond to individual questions over email.
All prizes will be awarded at the final discretion of the Future Fund. Our published decisions will be final and not subject to appeal. We also won't be able to explain in individual cases why we did not offer a prize.
Prizes will be awarded equally to coauthors unless the post indicates some other split. At our discretion, the Future Fund may provide partial credit across different entries if they together trigger a prize condition.
If a single person does research leading to multiple updates, Future Fund may—at its discretion—award the single largest prize for which the analysis is eligible (rather than the sum of all such prizes).
We will not offer awards to any analysis that we believe was net negative to publish due to information hazards, even if it moves our probabilities significantly and is otherwise excellent.
At most one prize will be awarded for each of the largest prize categories ($500k and $1.5M). (If e.g. two works convince us to assign < 3% subjective probability in AGI being developed in the next 20 years, we’ll award the prize to the most convincing piece (or split in case of a tie).)

For the first two weeks after it is announced—until October 7—the rules and conditions of the prize competition may be changed at the discretion of the Future Fund. After that, we reserve the right to clarify the conditions of the prizes wherever they are unclear or have wacky unintended results.

Information hazards

Please be careful not to publish information that would be net harmful to publish. We think people should not publish very concrete proposals for how to build AGI (if they know of them), and or things that are too close to that.

If you are worried publishing your analysis would be net harmful due to information hazards, we encourage you to a) write your draft and then b) ask about this using the “REQUEST FEEDBACK” feature on the Effective Altruism forum or LessWrong pages (appears on the draft post page, just before you would normally publish a post):

The moderators have agreed to help with this.

If you feel strongly that your analysis should not be made public due to information hazards, you may submit your prize entry through this form.

Some clarifications and answers to anticipated questions

What do you mean by AGI?

Imagine a world where cheap AI systems are fully substitutable for human labor. E.g., for any human who can do any job, there is a computer program (not necessarily the same one every time) that can do the same job for $25/hr or less. This includes entirely AI-run companies, with AI managers and AI workers and everything being done by AIs.

How large of an economic transformation would follow? Our guess is that it would be pretty large (see Aghion et al 2017, this post, and Davidson 2021), but - to the extent it is relevant - we want people competing for this prize to make whatever assumptions seem right to them.

For purposes of our definitions, we’ll count it as AGI being developed if there are AI systems that power a comparably profound transformation (in economic terms or otherwise) as would be achieved in such a world. Some caveats/clarifications worth noticing:

A comparably large economic transformation could be achieved even if the AI systems couldn’t substitute for literally 100% of jobs, including providing emotional support. E.g., Karnofsky’s notion of PASTA would probably count (though that is an empirical question), and possibly some other things would count as well.
If weird enough things happened, the metric of GWP might stop being indicative in the way it normally is, so we want to make sure people are thinking about the overall level of weirdness rather than being attached to a specific measure or observation. E.g., causing human extinction or drastically limiting humanity’s future potential may not show up as rapid GDP growth, but automatically counts for the purposes of this definition.

Why are you starting with such large prizes?

We really want to get closer to the truth on these issues quickly. Better answers to these questions could prevent us from wasting hundreds of millions of dollars (or more) and years of effort on our part.

We could start with smaller prizes, but we’re interested in running bold and decisive tests of prizes as a philanthropic mechanism.

A further consideration is that sometimes people argue that all of this futurist speculation about AI is really dumb, and that its errors could be readily explained by experts who can't be bothered to seriously engage with these questions. These prizes will hopefully test whether this theory is true.

Can you say more about why you hold the views that you do on these issues, and what might move you?

I (Nick Beckstead) will answer these questions on my own behalf without speaking for the Future Fund as a whole.

For "Conditional on AGI being developed by 2070, humanity will go extinct or drastically curtail its future potential due to loss of control of AGI." I am pretty sympathetic to the analysis of Joe Carlsmith here. I think Joe's estimates of the relevant probabilities are pretty reasonable (though the bottom line is perhaps somewhat low) and if someone convinced me that the probabilities on the premises in his argument should be much higher or lower I'd probably update. There are a number of reviews of Joe Carlsmith's work that were helpful to varying degrees but would not have won large prizes in this competition.

For assigning odds to AGI being developed in the next 20 years, I am blending a number of intuitive models to arrive at this estimate. They are mostly driven by a few high-level considerations:

I think computers will eventually be able to do things brains can do. I've believed this for a long time, but if I were going to point to one article as a reference point I'd choose Carlsmith 2020.
Priors that seem natural to me ("beta-geometric distributions") start us out with non-trivial probability of developing AGI in the next 20 years, before considering more detailed models. I've also believed this for a long time, but I think Davidson 2021's version is the best, and he gives 8% to AGI by 2036 through this method as a central estimate.
I assign substantial probability to continued hardware progress, algorithmic progress, and other progress that fuels AGI development over the coming decades. I'm less sure this will continue many decades into the future, so I assign somewhat more probability to AGI in sooner decades than later decades.
Under these conditions, I think we'll pass some limits—e.g. approaching hardware that's getting close to as good as we're ever going to get—and develop AGI if we're ever going to develop it.
I'm extremely uncertain about the hardware requirements for AGI (at the point where it's actually developed by humans), to a point where my position is roughly "I dunno, log uniform distribution over anything from the amount of compute used by the brain to a few orders of magnitude less than evolution." Cotra 2020—which considers this question much more deeply—has a similar bottom line on this. (Though her updated timelines are shorter.)
I'm impressed by the progress in deep learning to the point where I don't think we can rule out AGI even in the next 5-10 years, but I'm not impressed enough by any positive argument for such short timelines to move dramatically away from any of the above models..

(I'm heavily citing reports from Open Philanthropy here because a) I think they did great work and b) I'm familiar with it. I also recommend this piece by Holden Karnofsky, which brings a lot of this work—and other work—together.)

In short, you can roughly model me as having roughly trapezoidal probability density function over developing AGI from now to 2100, with some long tail extending beyond that point. There is about 2x as much weight at the beginning of the distribution as there is at the end of the century. The long tail includes a) insufficient data/hardware/humans not smart enough to solve it yet, b) technological stagnation/hardware stagnation, and c) reasons it's hard that I haven't thought of. The microfoundation of the probability density function could be: a) exponentially increasing inputs to AGI, b) log returns to AGI development on the key inputs, c) pricing in some expected slowdown in the exponentially increasing inputs over time, and d) slow updating toward increased difficulty of the problem as the time goes on, but I stand by the distribution more than the microfoundation.

What do you think could substantially alter your views on these issues?

We don't know. Most of all we'd just like to see good arguments for specific quantitative answers to the stated questions. Some other thoughts:

We like it when people state cleanly summarizable, deductively valid arguments and carefully investigate the premises leading to the conclusion (analytic philosopher style). See e.g. Carlsmith 2021.
We also like it when people quantify their subjective probabilities explicitly. See e.g. Superforecasting by Phil Tetlock.
We like a lot of the features described here by Luke Muehlhauser, though they are not necessary to be persuasive.
We like it when people represent opposing points of view charitably, and avoid appeals to authority.
We think it could be pretty persuasive to us if some (potentially small) group of relevant technical experts arrived at and explained quite different conclusions. It would be more likely to be persuasive if they showed signs of comfort thinking in terms of subjective probability and calibration. Ideally they would clearly explain the errors in the best arguments cited in this post.

These are suggestions for how to be more likely to win the prize, but not requirements or guarantees.

Who do we have to convince in order to claim the prize?

Final decisions will be made at the discretion of the Future Fund. We plan to rely in part on the judgment of other researchers and people we trust when deciding what to seriously engage with. Probably, someone winning a large prize looks like them publishing their arguments, those arguments getting a lot of positive attention / being flagged to us by people we trust, us seriously engaging with those arguments (probably including talking to the authors), and then changing our minds.

Are these statistically significant probabilities grounded in detailed published models that are confirmed by strong empirical regularities that you're really confident in?

No. They are what we would consider fair betting odds.

This is a consequence of the conception of subjective probability that we are working with. As stated above in a footnote: "We will pose many of these beliefs in terms of subjective probabilities, which represent betting odds that we consider fair in the sense that we'd be roughly indifferent between betting in favor of the relevant propositions at those odds or betting against them." For more on this conception of probability I recommend The Logic of Decision by Richard Jeffrey or this SEP entry.

Applicants need not agree with or use our same conception of probability, but hopefully these paragraphs help them understand where we are coming from.

Why do the prizes only get awarded for large probability changes?

We think that large probability changes would have much clearer consequences for our work, and be much easier to recognize. We also think that aiming for changes of this size is less common and has higher expected upside, so we want to attract attention to it.

Why is the Future Fund judging this prize competition itself?

Our intent in judging the prize ourselves is not to suggest that our judgments should be treated as correct / authoritative by others. Instead, we're focused on our own probabilities because we think that is what will help us to learn as much as possible.

Additional terms and conditions

Employees of FTX Foundation and contest organizers are not eligible to win prizes.
Entrants and Winners must be over the age of 18, or have parental consent.
By entering the contest, entrants agree to the Terms & Conditions.
All taxes are the responsibility of the winners.
The legality of accepting the prize in his or her country is the responsibility of the winners. Sponsor may confirm the legality of sending prize money to winners who are residents of countries outside of the United States.
Winners will be notified in a future blogpost.
Winners grant to Sponsor the right to use their name and likeness for any purpose arising out of or related to the contest. Winners also grant to Sponsor a non-exclusive royalty-free license to reprint, publish and/or use the entry for any purpose arising out of related to the contest including linking to or re-publishing the work.
Entrants warrant that they are eligible to receive the prize money from any relevant employer or from a contract standpoint.
Entrants agree that FTX Philanthropy and its affiliates shall not be liable to entrants for any type of damages that arise out of or are related to the contest and/or the prizes.
By submitting an entry, entrant represents and warrants that, consistent with the terms of the Terms and Conditions: (a) the entry is entrant’s original work; (b) entrant owns any copyright applicable to the entry; (c) the entry does not violate, in whole or in part, any existing copyright, trademark, patent or any other intellectual property right of any other person, organization or entity; (d) entrant has confirmed and is unaware of any contractual obligations entrant has which may be inconsistent with these Terms and Conditions and the rights entrant is required to have in the entry, including but not limited to any prohibitions, obligations or limitations arising from any current or former employment arrangement entrant may have; (e) entrant is not disclosing the confidential, trade secret or proprietary information of any other person or entity, including any obligation entrant may have in connection arising from any current or former employment, without authorization or a license; and (f) entrant has full power and all legal rights to submit an entry in full compliance with these Terms and Conditions.

^{^}
We will pose many of these beliefs in terms of subjective probabilities, which represent betting odds that we consider fair in the sense that we'd be roughly indifferent between betting in favor of the relevant propositions at those odds or betting against them.
^{^}
For the sake of definiteness, these are Nick Beckstead’s subjective probabilities, and they don’t necessarily represent the Future Fund team as a whole or its funders.
^{^}
It might be argued that this makes the prize encourage people to have views different from those presented here. This seems hard to avoid, since we are looking for information that changes our decisions, which requires changing our beliefs. People who hold views similar to ours can, however, win the $200k canonical reference prize.
^{^}
A slight update/improvement on something that would have won the prize in the past (e.g. this update by Ajeya Cotra) does not automatically qualify due to being better than the existing canonical reference. Roughly speaking, the update would need to be sufficiently large that the new content would be prize-worthy on its own.

255 Reactions

Mentioned in

291Pre-Announcing the 2023 Open Philanthropy AI Worldviews Contest

137Announcing the Open Philanthropy AI Worldviews Contest

97What do XPT forecasts tell us about AI risk?

89AI Safety - 7 months of discussion in 17 minutes

34AI X-Risk: Integrating on the Shoulders of Giants

Load more (5/14)

More posts like this

Comments125

Sorted by

New & upvoted

Click to highlight new comments since: Today at 10:04 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Nick_BecksteadSep 24 202266

Do you believe some statement of this form?

"FTX Foundation will not get submissions that change its mind, but it would have gotten them if only they had [fill in the blank]"

E.g., if only they had…

Allowed people to publish not on EA Forum / LessWrong / Alignment Forum
Increased the prize schedule to X
Increased the window of the prize to size Y
Advertised the prize using method Z
Chosen the following judges instead
Explained X aspect of their views better

Even better would be a statement of the form:

"I personally would compete in this prize competition, but only if..."

If you think one of these statements or some other is true, please tell me what it is! I'd love to hear your pre-mortems, and fix the things I can (when sufficiently compelling and simple) so that we can learn as much as possible from this competition!

I also think predictions of this form will help with our learning, even if we don't have time/energy to implement the changes in question.

Habryka [Deactivated]Sep 24 202267

I don't have anything great, but the best thing I could come up with was definitely "I feel most stuck because I don't know what your cruxes are".

I started writing a case for why I think AI X-Risk is high, but I really didn't know whether the things I was writing were going to be hitting at your biggest uncertainties. My sense is you probably read most of the same arguments that I have, so our difference in final opinion is probably generated by some other belief that you have that I don't, and I don't really know how to address that preemptively.

I might give it a try anyways, and this doesn't feel like a defeater, but in this space it's the biggest thing that came to mind.

Nick_BecksteadSep 24 202217

Thanks! The part of the post that was supposed to be most responsive to this on size of AI x-risk was this:

For "Conditional on AGI being developed by 2070, humanity will go extinct or drastically curtail its future potential due to loss of control of AGI." I am pretty sympathetic to the analysis of Joe Carlsmith here. I think Joe's estimates of the relevant probabilities are pretty reasonable (though the bottom line is perhaps somewhat low) and if someone convinced me that the probabilities on the premises in his argument should be much higher or lower I'd probably update. There are a number of reviews of Joe Carlsmith's work that were helpful to varying degrees but would not have won large prizes in this competition.

I think explanations of how Joe's probabilities should be different would help. Alternatively, an explanation of why some other set of propositions was relevant (with probabilities attached and mapped to a conclusion) could help.

Guy Raveh

Sep 26 2022

I think it's kinda weird and unproductive to focus a very large prize on things that would change a single person's views, rather than be robustly persuasive to many people. E.g. does this imply that you personally control all funding of the FF? (I assume you don't, but then it'd make sense to try to convince all FF managers, trustees etc.)

NunoSempereSep 24 202216

FWIW, I would prefer a post on "what actually drives your probabilities" over a "what are the reasons that you think will be most convincing to others".

nostalgebraistSep 26 202260

...if they had explained why their views were not moved by the expert reviews OpenPhil has already solicited.

In "AI Timelines: Where the Arguments, and the 'Experts,' Stand," Karnofsky writes:

Then, we commissioned external expert reviews.⁷
Speaking only for my own views, the "most important century" hypothesis seems to have survived all of this. Indeed, having examined the many angles and gotten more into the details, I believe it more strongly than before.

The footnote text reads, in part:

Reviews of Bio Anchors are here; reviews of Explosive Growth are here; reviews of Semi-informative Priors are here.

Many of these reviewers disagree strongly with the reports under review.

Davidson 2021 on semi-informative priors received three reviews.

By my judgment, all three made strong negative assessments, in the sense (among others) that if one agreed with the review, one would not use the report's reasoning to inform decision-making in the manner advocated by Karnofsky (and by Beckstead).

From Hajek and Strasser's review:

His final probability of 7.3% is a nice summary of his conclusion, but its precision (including a decimal place!) belies the vagueness of the question, the imprecise and vague

... (read more)

Tom_DavidsonSep 28 202218

if they had explained why their views were not moved by the expert reviews OpenPhil has already solicited.

I included responses to each review, explaining my reactions to it. What kind of additional explanation were you hoping for?

Davidson 2021 on semi-informative priors received three reviews.
By my judgment, all three made strong negative assessments, in the sense (among others) that if one agreed with the review, one would not use the report's reasoning to inform decision-making in the manner advocated by Karnofsky (and by Beckstead).

For Hajek&Strasser's and Halpern’s reviews, I don't think "strong negative assessment" is supported by your quotes. The quotes focus on things like 'the reported numbers are too precise' and 'we should use more than a single probability measure' rather than whether the estimate is too high or too low overall or whether we should be worrying more vs less about TAI. I also think the reviews are more positive overall than you imply, e.g. Halpern's review says "This seems to be the most serious attempt to estimate when AGI will be developed that I’ve seen"

Davidson 2021 on explosive growth received many reviews... Tw

... (read more)

Guy Raveh

Sep 27 2022

Wow, thanks for this well written summary of expert reviews that I didn't know existed! Strongly upvoted.

Greg_Colbourn ⏸️

Sep 27 2022

I agree that finding the cruxes of disagreement are important, but I don't think any of the critical quotes you present above are that strong. The reviews of semi-informative priors talk about error bars and precision (i.e. critique the model), but don't actually give different answers. On explosive growth, Jones talks about the conclusion being contrary to his "intuitions", and acknowledges that "[his] views may prove wrong". Vollrath mentions "output and demand", but then talks about human productivity when regarding outputs, and admits that AI could create new in-demand products. If these are the best existing sources for lowering the Future Fund's probabilities, then I think someone should be able to do better.

Greg_Colbourn ⏸️

Sep 27 2022

On the other hand, I think that the real probabilities are higher, and am confused as to why the Future Fund haven't already updated to higher probabilities, given some of the writing already out there. I give a speculative reason here.

Sharmake

Sep 27 2022

Weakly downvoting due to over-strong claims and the evidence doesn't fully support your view. This is weak evidence against AGI claims, but the claims in this comment are too strong. Quoting Greg Colbourn:

LinchSep 25 202256

I attach less than 50% in this belief, but probably higher than the existing alternative hypotheses:

FTX Foundation will not get submissions that change its mind, but it would have gotten them if only they had [fill in the blank]

Given 6 months or a year for people to submit to the contest rather than 3 months.

I think forming coherent worldviews take a long time, most people have day jobs or school, and even people who have the flexibility to take weeks/ a month off to work on this full-time probably need some warning to arrange this with their work. Also some ideas take time to mull over so you benefit from calendar time spread even when the clock time takes the same.

As presented, I think this prize contest is best suited for people who a) basically have the counterarguments in mind/in verbal communication but never bothered to write it down yet or b) have a draft argument sitting in a folder somewhere and never gotten around to publishing it. In that model, the best counterarguments are already "laying there" in somebody's head or computer and just need some incentives for people to make them rigorous.

However, if the best counterarguments are currently confu... (read more)

Peter Wildeford

Sep 26 2022

Maybe FTX-FF could commit in advance to, if the grand prizes for this contest are not won this year, re-run this contest over next year?

Arden KoehlerSep 24 202244

you might already be planning on dong this, but it seems like you increase the chance of getting a winning entry if you advertise this competition in a lot of non-EA spaces. I guess especially technical AI spaces e.g. labs, universities. Maybe also trying to advertise outside the US/UK. Given the size of the prize it might be easy to get people to pass on the advertisement among their groups. (Maybe there's a worry about getting flack somehow for this, though. And also increases overhead to need to read more entries, though sounds like you have some systems set up for that which is great.)

In the same vein I think trying to lower the barriers to entry having to do with EA culture could be useful - e.g. +1 to someone else here talking about allowing posting places besides EAF/LW/AF, but also maybe trying to have some consulting researchers/judges who find it easier/more natural to engage in non-analytic-philosophy-style arguments.

Leon LangSep 24 202229

… if only they had allowed people not to publish on EA Forum, LessWrong, and Alignment Forum :)

Honestly, it seems like a mistake to me to not allow other ways of submission. For example, some people may not want to publicly apply for a price or be associated with our communities. An additional submission form might help with that.

Miles_BrundageSep 25 202240

Related to this, I think some aspects of the post were predictably off-putting to people who aren't already in these communities - examples include the specific citations* used (e.g. Holden's post which uses a silly sounding acronym [PASTA], and Ajeya's report which is in the unusual-to-most-people format of several Google Docs and is super long), and a style of writing that likely comes off as strange to people outside of these communities ("you can roughly model me as"; "all of this AI stuff").

*some of this critique has to do with the state of the literature, not just the selection thereof. But insofar as there is a serious interest here in engaging with folks outside of EA/rationalists/longtermists (not clear to me if this is the case), then either the selections could have been more careful or caveated, or new ones could have been created.

Guy Raveh

Sep 26 2022

I've also seen online pushback against the phrasing as a conditional probability: commenters felt putting a number on it is nonsensical because the events are (necessarily) poorly defined and there's way too much uncertainty.

Morpheus

Oct 6 2022

Do you also think this yourself? I don't clearly see what worlds look like, where P (doom | AGI) would be ambiguous in hindsight? Some mayor accident because everything is going too fast?

Guy Raveh

Oct 6 2022

There are some things we would recognize as an AGI, but others (that we're still worried about) are ambiguous. There are some things we would immediately recognize as 'doom' (like extinction) but others are more ambiguous (like those in Paul Christiano's "what failure looks like", or like a seemingly eternal dictatorship).

Morpheus

Oct 7 2022

I sort of view AGI as a standin for powerful optimization capable of killing us in AI Alignment contexts.

Morpheus

Oct 7 2022

Yeah, I think I would count these as unambigous in hindsight. Though siren Worlds might be an exception.

Geoffrey Miller

Sep 24 2022

I'm partly sympathetic to the idea of allowing submissions in other forums or formats. However, I think it's likely to be very valuable to the Future Fund and the prize judges, when sorting through potentially hundreds or thousands of submissions, to be able to see upvotes, comments, and criticisms from EA Forum, Less Wrong, and Alignment Forum, which is where many of the subject matter experts hang out. This will make it easier to identify essays that seem to get a lot of people excited, and that don't contain obvious flaws or oversights.

Guy Raveh

Sep 26 2022

I think it's the opposite. Only those experts who already share views similar to the FF (or more pessimistic) are there, and they'd introduce a large bias.

Leon Lang

Sep 24 2022

Yes, that makes sense. How about stating that reasoning and thereby nudging participants to post in the EA forum/LessWrong/Alignment Forum, but additionally have a non-public submission form? My guess would be that only a small number of participants would then submit via the form, so the amount of additional work should be limited. This bet seems better to me than the current bet where you might miss really important contributions.

[anonymous]Sep 25 202226

I really think you need to commit to reading everyone's work, even if it's an intern skimming it for 10 minutes as a sifting stage.

The way this is set up now - ideas proposed by unknown people in community are unlikely to be engaged with, and so you won't read them.

Look at the recent cause exploration prizes. Half the winners had essentially no karma/engagement and were not forecasted to win. If open phanthropy hadn't committed to reading them all, they could easily have been missed.

Personally, yes I am much less likely to write something and put effort in if I think no one will read it.

David ThorstadSep 24 202223

Could you put some judges on the panel who are a bit less worried about AI risk than your typical EA would be? EA opinions tend to cluster quite strongly around an area of conceptual space that many non-EAs do not occupy, and it is often hard for people to evaluate views that differ radically from their own. Perhaps one of the superforecasters could be put directly onto the judging panel, pre-screening for someone who is less worried about AI risk.

weeatquinceSep 25 202220

"FTX Foundation will not get submissions that change its mind, but it would have gotten them if only they had [broadened the scope of the prizes beyond just influencing their probabilities]"

Examples of things someone considering entering the competition would presumably consider out of scope are:

Making a case that AI misalignment is the wrong level of focus – even if AI risks are high it could be that AI risks and other risks are very heavily weighted towards specific risk factor scenarios, such as a global hot or cold war. This view is apparently expressed by Will (see here).
Making a case based on tractability – that a focus on AI risk is misguided as the ability to affect such risks are low (not to far away from the views of Yudkowsky here).
Making the case that we should not put much decisions weight on future predictions of risks – E.g. as long-run predictions of future technology as they are inevitably unreliable (see here) or E.g. as modem risk assessment best practice says that probability estimates should only play a limited role in risk assessments (my view expressed here) or other.
Making the case that some other x-risk is more pressing, more likely, more tractable, e

... (read more)

weeatquince

Sep 25 2022

Sorry I realise scrolling down that I am making much the same point as MichaelDickens' comment below. Hopefully added some depth or something useful.

NunoSempereSep 24 202219

"FTX Foundation will not get submissions that change its mind, but it would have gotten them if only they had [fill in the blank]"

"I personally would compete in this prize competition, but only if..."

Ehh, the above is too strong, but:

You would get more/better submissions if...
I would be more likely to compete in that if...

your reward schedule rewarded smaller shifts in proportion to how much they moved your probabilities (e.g., $X per bit).

E.g., as it is now, if two submissions together move you across a threshold, it would seem as if:

neither gets a prize
only the second gets a prize

and both seem suboptimal.

e.g., if you get information in one direction from one submission, but also information from another submission in another direction, and they cancel out, neither gets a reward. This is particularly annoying if it makes getting-a-prize-or-not depending on the order of submissions.

e.g., because individual people's marginal utility of money is diminishing, a 10% chance of reaching your threshold and getting $X will be way less valuable to participants than moving your opinion around 10% of the way to a threshold and getting $X/10.

e.g., if someone has information which points ... (read more)

Owen Cotton-BarrattSep 25 202218

On the face of it an update 10% of the way towards a threshold should only be about 1% as valuable to decision-makers as an update all the way to the threshold.

(Two intuition pumps for why this is quadratic: a tiny shift in probabilities only affects a tiny fraction of prioritization decisions and only improves them by a tiny amount; or getting 100 updates of the size 1% of the way to a threshold is super unlikely to actually get you to a threshold since many of them are likely to cancel out.)

However you might well want to pay for information that leaves you better informed even if it doesn't change decisions (in expectation it could change future decisions).

Re. arguments split across multiple posts, perhaps it would be ideal to first decide the total prize pool depending on the value/magnitude of the total updates, and then decide on the share of credit allocation for the updates. I think that would avoid the weirdness about post order or incentivizing either bundling/unbundling considerations, while still paying out appropriately more for very large updates.

NunoSempere

Sep 26 2022

So I don't disagree that big shifts might be (much) more valuable that small shifts. But I do have the intuition that there is a split between: * What would the FTX foundation find most valuable * What should they be incentivizing because incentivizing providing information is more robust to various artifacts than incentivizing changing minds.

amc

Sep 25 2022

I don't understand this. Have you written about this or have a link that explains it?

Owen Cotton-BarrattSep 26 202212

Sorry I don't have a link. Here's an example that's a bit more spelled out (but still written too quickly to be careful):

Suppose there are two possible worlds, S and L (e.g. "short timelines" and "long timelines"). You currently assign 50% probability to each. You invest in actions which help with either until your expected marginal returns from investment in either are equal. If the two worlds have the same returns curves for actions on both, then you'll want a portfolio which is split 50/50 across the two (if you're the only investor; otherwise you'll want to push the global portfolio towards that).

Now you update either that S is 1% more likely (51%, with L at 49%).

This changes your estimate of the value of marginal returns on S and on L. You rebalance the portfolio until the marginal returns are equal again -- which has 51% spending on S and 49% spending on L.

So you eliminated the marginal 1% spending on L and shifted it to a marginal 1% spending on S. How much better spent, on average, was the reallocated capital compared to before? Around 1%. So you got a 1% improvement on 1% of your spending.

If you'd made a 10% update you'd get roughly a 10% improvement on 10% of your spending. If you updated all the way to certainty on S you'd get to shift all of your money into S, and it would be a big improvement for each dollar shifted.

Lukas Finnveden

Sep 27 2022

I think this particular example requires an assumption of logarithmically diminishing returns, but is right with that. (I think the point about roughly quadratic value of information applies more broadly than just for logarithmically diminishing returns. And I hadn't realised it before. Seems important + underappreciated!) One quirk to note: If a funder (who I want to be well-informed) is 50/50 on S vs L, but my all-things-considered belief is 60/40, then I would value the first 1% they shift towards my position much more than they do (maybe 10x more?) and will put comparatively little value on shifting them all the way (ie the last percent from 59% to 60% is much less important). You can get this from a pretty similar argument as in the above example. (In fact, the funder's own much greater valuation of shifting 10% than 1% can be seen as a two-step process where (i) they shift to 60/40 beliefs, and then (ii) they first get a lot of value from shifting their allocation from 50 to 51, then slightly less from shifting from 51 to 52, etc...)

Owen Cotton-Barratt

Sep 27 2022

I agree with all this. I meant to state that I was assuming logarithmic returns for the example, although I do think some smoothness argument should be enough to get it to work for small shifts.

rgbSep 24 202219

I think that the post should explain briefly, or even just link to, what a “superforecaster” is. And if possible explain how and why this serves an independent check.

The superforecaster panel is imo a credible signal of good faith, but people outside of the community may think “superforecasters” just means something arbitrary and/or weird and/or made up by FTX.

(The post links to Tetlock’s book, but not in the context of explaining the panel)

Guy Raveh

Sep 26 2022

I think this would be better than the current state, but really any use of "superforecasters" is going to be extremely off-putting to outsiders.

rgb

Sep 26 2022

That may be right - an alternative would be to taboo the word in the post, and just explain that they are going to use people with an independent, objective track record of being good at reasoning under uncertainty. Of course, some people might be (wrongly, imo) skeptical of even that notion, but I suppose there's only such much one can do to get everyone on board. It's a tricky balance of making it accessible to outsiders while still just saying what you believe about how the contest should work.

Guy Raveh

Sep 26 2022

To be clear, I wrote "superforecasters" not because I mean the word, but because I think the very notion is controversial like you said - for example, I personally doubt the existence of people who can be predictably "good at reasoning under uncertainty" in areas where they have no expertise.

Charlie DoughertySep 26 202211

I would have also suggested a prize that generally confirms your views, but with an argument that you consider superior to your previous beliefs.

This prize is similar to the bias of printing research that claims something new rather than confirming previous research.

That would also resolve any particular bias baked into the process that compels people to convince you that you have to update instead of actually figuring out what they actually think is right.

Zach Stein-Perlman

Sep 24 2022

Agree with Habryka: I believe there exist decisive reasons to believe in shorter timelines and higher P(doom) than you accept, but I don't know what your cruxes are.

MichaelStJules

Sep 24 2022

If you think they're decisive, shouldn't you be able to write a persuasive argument without knowing the cruxes, although with (possibly much) more work?

Zach Stein-PerlmanSep 24 202212

Sure (with a ton of work), though it would almost entirely consist of pointing to others' evidence and arguments (which I assume Nick would be broadly familiar with but would find less persuasive than I do, so maybe this project also requires imagining all the reasons we might disagree and responding to each of them...).

David Johnston

Sep 24 2022

FTX Foundation might get fewer submissions that change its mind than they would have gotten if only they had considered strategic updates prize worthy. The unconditional probability of takeover isn’t necessarily the question of most strategic interest. There’s a huge difference between “50% AI disempowers humans somehow on the basis of naive principle of indifference” and “50% MIRI-style assumptions about AI are correct”*. One might conclude from the second that the first is also true, but the first has no strategic implications (the principle of indifference ignores such things!), while the second has lots of strategic implications. For example, it suggests “ totally lock down AI development, at least until we know more” is what we need to aim for. I’m not sure exactly where you stand on whether that is needed, but given that your stated position seems to be relying substantially on outside view type reasoning, it might be a big update. The point is: middling probabilities of strategically critical hypotheses might actually be more important updates than extreme probabilities of strategically opaque hypotheses. My suggestion (not necessarily a full solution) is that you consider big strategic updates potentially prizeworthy. For example: do we gain a lot by delaying AGI for a few years? If we consider all the plausible paths to AGI, do we gain a lot by hastening the development of the top 1% most aligned by a few years? I think it’s probably too hard to pre-specify exactly which strategic updates would be prizes worthy. *By which I mean something like “more AI capability eventually yields doom, no matter what, unless it’s highly aligned”

Noah Scales

Sep 28 2022

I personally would compete in this prize competition, but only if I were free to explore: P(misalignment x-risk|AGI): Conditional on AGI being developed by 2070, humanity will go extinct or drastically curtail its future potential due to concentration of power derived from AGI technology. You wrote: but this list does not include the conditional probability that interests me. You wrote: This seems really motivating. You identify: * global poverty * animal suffering * early death * debilitating disease as problems that TAI could help humanity solve. I will offer briefly that humans are sensitive to changes in their behaviors, at least as seen in advance, that deprive them of choices they have already made. We cause: * global poverty through economic systems that support exploitation of developing countries and politically-powerless people (e.g., through corporate capitalism and military coups) * animal suffering through widespread factory farming (enough to dominate terrestrial vertebrate populations globally with our farm animals) and gradual habitat destruction (enough to threaten the extinction of a million species) * early death through lifestyle-related debilitating disease (knock-on effects of lifestyle choices in affluent countries now spread throughout the globe). So these TAI would apparently resolve, through advances in science and technology, various immediate causes, with a root cause found in our appetite (for wealth, power, meat, milk, and unhealthy lifestyles). Of course, there are other reasons for debilitating disease and early death than human appetite. However, your claim implies to me that we invent robots and AI to either reduce or feed our appetites harmlessly. Causes of global poverty, animal suffering, some debilitating diseases, and early human death are maintained by incentive structures that benefit a subset of the global population. TAI will apparently remove those incentive structures, but not by any mechanism that I

MichaelDickensSep 23 202240

This is an excellent idea and seems like a good use of money, and the sort of thing that large orgs should do more of.

It looks to me like there is a gap in the space of mind-changing arguments that the prizes cover. The announcement raises the possibility that "a bunch of this AI stuff is basically right, but we should be focusing on entirely different aspects of the problem." But it seems to me that if someone successfully argues for this position, they won't be able to win any of the offered prizes.

Relatedly, if someone argues "AI is as important as you think, but some other cause is even more important than AI risk and you should be allocating more to it", I don't think this would win a prize, but it seems deserving of one.

(But it does seem harder to determine the winning criteria for prizes on those types of arguments.)

MichaelDickensSep 23 202234

After thinking some more, it also occurs to me that it would be easier to change your prioritization by changing your beliefs about expected tractability. For example, shifting P(misalignment x-risk|AGI) from 15% to 1.5% would be very hard, but my intuition is that shifting your subjective {expected total money required to solve AI alignment} by a factor of 10 would be significantly easier, and both have the same effect on the cost-effectiveness of AI work.

On the other hand, total money to solve AI alignment might be the wrong metric. Perhaps you expect it only costs (say) $1 billion, which is well within your budget, but that it costs 20 person-years of senior grantmaker time to allocate the money correctly. In that case, a 10x change in cost-effectiveness matters less than 10x (it still matters somewhat because higher cost-effectiveness means you can afford to spend less time thinking about which grants to make, and vice versa).

Nick_BecksteadSep 23 202213

Thanks for the feedback! This is an experiment, and if it goes well we might do more things like it in the future. For now, we thought it was best to start with something that we felt we could communicate and judge relatively cleanly.

howdoyousay?

Oct 3 2022

Thanks for clarifying this is in fact the case Nick. I get how setting a benchmark - in this case an essay's persuasiveness at shifting probabilities you assign to different AGI / extinction scenarios - makes it easier to judge across the board. But as someone who works in this field, I can't say I'm excited by the competition or feel it will help advance things. Basically, I don't know if this prize is incentivising things which matter most. Here's why: 1. The focus is squarely on likelihood of things going wrong against different timelines. It has nothing to do with solutions space 2. But solutions are still needed, even if the likelihood reduces / increases by a large amount, because the impact would be so high. 1. Take Proposition 1: humanity going extinct or drastically curtailing its future due to loss of control of AGI. I can see how a paper which changes your probabilities from 15% to either 7% or 35% would lead to FTX changing the amount invested in this risk relative to other X risks - this is good. However, I doubt it'd lead to a full on disinvestment, let alone that you still wouldn't want to fund the best solutions, or be worried if the solutions to hand looked weak 3. Moreover, capabilities advancements have rapidly changed priors of when AGI / transformative AI would be developed, and will likely continue to do so iteratively. Once this competition is done, new research could have shifted the dial again. Solutions space will likely be the same 4. So long as the capabilities-alignment advancements gap persists, solutions will more likely come from the AI governance space than AI alignment research space just yet 5. The solution space is pretty sparse still in terms of governance of AI. But given the argument in 2), I think this is a big risk and one where further work should be stimulated. There's likely loads of value off the table, people sitting on ideas, especially people outside the EA community who have worked in governance / non-

Geoffrey MillerSep 24 202229

tldr: Another way to signal-boost this competition might be through prestige and not just money, by including some well-known people as judges, such as Elon Musk, Vitalik Buterin, or Steven Pinker.

One premise here is that big money prizes can be highly motivating, and can provoke a lot of attention, including from researchers/critics who might not normally take AI alignment very seriously. I agree.

But, if Future Fund really wants maximum excitement, appeal, and publicity (so that the maximum number of smart people work hard to write great stuff), then apart from the monetary prize, it might be helpful to maximize the prestige of the competition, e.g. by including a few 'STEM celebrities' as judges.

For example, this could entail recruiting a few judges like tech billionaires Elon Musk, Jeff Bezos, Sergey Brin, Tim Cook, Ma Huateng, Ding Lei, or Jack Ma, crypto leaders such as Vitalik Buterin or Charles Hoskinson, and/or well-known popular science writers, science fiction writers/directors, science-savvy political leaders, etc. And maybe, for an adversarial perspective, some well-known AI X-risk skeptics such as Steven Pinker, Gary Marcus, etc.

Since these folks are mostly... (read more)

Max Kearney

Sep 27 2022

I also wonder if it would be cost-effective to spend some part of the contest’s budget on outreach to high-potential contributors. Rough/vague example: pay someone to… * research which individuals would likely have especially compelling arguments to contribute * determine which people in EA’s network are best positioned to make (personal) contact with those individuals * spend money to increase the likelihood that these individuals are successfully contacted and encouraged to submit something to the contest (e.g. arrange a dinner or meeting that they deem worthwhile to attend, where the contest is outlined to them)

Yonatan CaleSep 24 202226

TL;DR: We might need to ping pong with you in order to change your mind. We don't know why you believe what you believe.

60% AGI by 2100 seems really low (as well as 15% `P(misalignment x-risk|AGI)`). I'd need to know why you believe it in order to change your mind.

Specifically, I'd be happy to hear where you disagree with AGI ruin scenarios are likely (and disjunctive) by So8res.

Yonatan CaleSep 26 202217

Adding: I'm worried that nobody will address FTX's reasons to believe what they believe, and FTX will conclude "well, we put out a $1.5M bounty and nobody found flaws, they only addressed straw arguments that we don't even believe in, this is pretty strong evidence we are correct!

Please consider replying, FTX!

philgoetzNov 13 202222

You wrote, "we think it's really possible that… a bunch of this AI stuff is basically right, but we should be focusing on entirely different aspects of the problem," and that you're interesting in "alternative positions that would significantly alter the Future Fund's thinking about the future of AI." But then you laid out specifically what you want to see: data and arguments to change your probability estimates of the timeline for specific events.

This rules out any possibility of winning these contests by arguing that we should be focusing on entirely different aspects of the problem, or of presenting alternative positions that would significantly alter the Future Fund's thinking about the future of AI. It looks like the Future Fund has already settled on one way of thinking about the future of AI, and just wants help tweaking its Gantt chart.

I see AI safety as a monoculture, banging away for decades on methods that still seem hopeless, while dismissing all other approaches with a few paragraphs here and there. I don't know of any approaches being actively explored which I think clear the bar of having a higher expected value than doing nothing.

Part of the reason... (read more)

Nick_BecksteadSep 24 202223

Do you believe that there is something already published that should have moved our subjective probabilities outside of the ranges noted in the post? If so, I'd love to know what it is! Please use this thread to collect potential examples, and include a link. Some info about why it should have done that (if not obvious) would also be welcome. (Only new posts are eligible for the prizes, though.)

Zach Stein-PerlmanSep 24 202217

I think considerations like those presented in Daniel Kokotajlo's Fun with +12 OOMs of Compute suggest that you should have ≥50% credence on AGI by 2043.

plex

Sep 24 2022

Agree, and add that code models won't be data constrained as they can generate their own training data. It's simple to write tests automatically, and you can run the code to see whether it passes the tests before adding it to your training dataset. As an unfortunate side effect, part of this process involves constantly and automatically running code output by a large model, and feeding it data which it generated so it can update its weights, both of which are not good safety-wise if the model is misaligned and power seeking. I don't know if this has been incorporated into a wider timelines analysis yet as it is quite recent, but this was a notable update for me given the latest scaling laws which indicate that data is the constraining factor, not parameter count. Much shorter timelines than 2043 seem like a live and strategically relevant possibility.

Greg_Colbourn ⏸️ Sep 24 202212

This is more of a meta-consideration around shared cultural background and norms. Could it just be a case of allowing yourselves to update toward more scary-sounding probabilities? You have all the information already. This video from Rob Miles ("There's No Rule That Says We'll Make It")[transcript copied from YouTube] made me think along these lines. Aside from background culture considerations around human exceptionalism (inspired by religion) and optimism favouring good endings (Hollywood; perhaps also history to date?), I think there is also an inherent conservatism borne by prestigious mega-philanthropy whereby a doom-laden outlook just doesn't fit in.

Optimism seems to tilt one in favour of conjunctive reasoning, and pessimism favours disjunctive reasoning. Are you factoring both in?

Geoffrey Miller

Sep 24 2022

This is a pretty deep and important point. There may be psychological and cultural biases that make it pretty hard to shift the expected likelihoods of worst-case AI scenarios much higher than they already are -- which might bias the essay contest against arguments winning even if they make a logically compelling case for more likely catastrophes. Maybe one way to reframe this is to consider the prediction “P(misalignment x-risk|AGI)” to also be contingent on us muddling along at the current level of AI alignment effort, without significant increases in funding, talent, insights, or breakthroughs. In other words, probability of very bad things happening, given AGI happening, but also given the status-quo level of effort on AI safety.

Yonatan CaleSep 25 202221

$100 to change my mind to FTX's views

If you change my mind to any of:

P(misalignment x-risk|AGI) is between 7% 0% to 35%
AGI will be developed by January 1, 2100 is between ~~30%~~ 0% and 60%

I'm not adding the "by 2043" section:

because it is too complicated for me to currently think about clearly so I don't think I'd be a good discussion partner, but I'd appreciate help there too

My current opinion

Is that we're almost certainly doomed (80%? more?), I can't really see a way out before 2100 except for something like civilizational collapse.

My epistemic status

I'm not sure, I'm not FTX.

Pitch: You will be doing a good thing if you change my mind

You will help me decide whether to work on AI Safety, and if I do, I'll have better models to do it with. If I don't, I'll go back to focusing on the other projects I'm up to. I'm a bit isolated (I live in Israel), and talking to people from the international community who can help me not get stuck in my current opinions could really help me.

Technicalities

How to talk to me? I think the best would be to comment here so our discussion will be online and people can push back, but there are more contact methods in my profile. I don't officially

... (read more)

Yonatan Cale

Sep 25 2022

Replying to a DM: My current priors are roughly represented by AGI ruin scenarios are likely (and disjunctive). I also expect/guess my disagreement with many people would be around our priors, not the specifics. I think many people have a prior of "I'm not sure, and so let's assume we won't all die", which seems wrong, but I'm open to talk. I think most of the work with changing each other's mind will be locating the crux (as I suggested FTX would help us do with them).

Guy Raveh

Sep 29 2022

I'm willing to discuss this over Zoom, or face to face once I return to Israel in November. What I think my main points are: * We don't seem to be anywhere near AGI. The amount of compute might very soon be enough but we also need major theoretical breakthroughs. * Most extinction scenarios that I've read about or thought about require some amount of bad luck, at least if AGI is born out of the ML paradigm * AGI is poorly defined, so it's hard to reason on what it would do once it comes into existence, of you could even describe that as a binary event * It seems unlikely that a malignant AI succeeds in deceiving us until it is capable of preventing us from shutting it off I'm not entirely convinced in any of them - I haven't thought about this carefully. Edit: there's a doom scenario that I'm more worried about, and it doesn't require AGI - and that's global domination by a tyrannical government.

Yonatan Cale

Sep 28 2022

For transparency: I'm discussing this with Andrew Timm here. (But please don't let this stop you from opening another conversation with me)

MichaelStJulesSep 23 202219

Two questions (although I very probably won't make a submission myself):

How likely do you think it is that anyone will win?
How many hours of work do you expect a winning submission to take? The reports you cite for informing your views seem like they were pretty substantial.

Nick_BecksteadSep 23 202215

We are very unsure on both counts! There are some Manifold Markets on the first question, though!

I do think articles wouldn't necessarily need to be that long to be convincing to us, and this may be a consequence of Open Philanthropy's thoroughness. Part of our hope for these prizes is that we'll get a wider range of people weighing in on these debates (and I'd expect less length there).

MichaelDickens

Sep 23 2022

Link doesn't work for me. What does work for me is going to http://manifold.markets/ and searching "future fund", it does work (and this gives me the exact URL that you linked, so I'm not sure why the link doesn't work).

RemmeltNov 11 202215

Question just to double-check: are posts no longer going to be evaluated for the AI Worldview Prize? Given that is, that the FTX Future team has resigned (https://forum.effectivealtruism.org/posts/xafpj3on76uRDoBja/the-ftx-future-fund-team-has-resigned-1).

Geoffrey MillerSep 24 202211

Strongly endorsed this comment.

If we really take infohazards seriously, we shouldn't just be imagining EAs casually reading draft essays, sharing them, and the ideas gradually percolating out to potential bad actors.

Instead, we should take a fully adversarial, red-team mind-set, and ask, if a large, highly capable geopolitical power wanted to mine EA insights for potential applications of AI technology that could give them an advantage (even at some risk to humanity in general), how would we keep that from happening?

We would be naive to think that intelligence agencies of various major countries that are interested in AI don't have at least a few intelligence analysts reading EA Forum, LessWrong, & Alignment Forum, looking for tips that might be useful -- but that we might consider infohazards.

weeatquinceSep 28 202210

Nick, very excited by this and to see what this prize produces. One think I would find super useful is to know your probability of a bio x-risk by 2100. Thanks.

Greg_Colbourn ⏸️ Sep 24 202210

Looking forward to seeing the entries. Similar to others, I feel that P(misalignment x-risk|AGI) is high (at least 35%, and likely >75%), so think that a prize for convincing FF of that should be won. Similar for P(AGI will be developed by January 1, 2043) >45%. But then I'm also not sure what would be needed in addition to all the great pieces of writing on this already out there (some mentioned in OP).

I'm hoping that there will be good entries from Eliezer Yudkowsky (on P(misalignment x-risk|AGI) >75%; previous), Ajeya Cotra (on P(AGI will be developed by January 1, 2043) >45%; previous), Daniel Kokotajlo (on P(AGI will be developed by January 1, 2043) >75%?; previous) and possibly Holden Karnofsky (although I'm not sure his credences for these probabilities are much different to FF's current baseline; previous). Also Carlsmith says he's recently (May 2022) updated his probabilities from "~5%" to ">10%" for P(misalignment x-risk) by 2070. This is unconditional, i.e. including p(AGI) by 2070, and his estimate for P(AGI by 2070) is 65%, so that puts him at P(misalignment x-risk|AGI) >15%, so an entry from him (for P(misalignment x-risk|A... (read more)

Greg_Colbourn ⏸️ Sep 24 202210

Is this the largest monetary prize in the world for a piece of writing? Is it also the largest in history?

Greg_Colbourn ⏸️

Sep 26 2022

Re the 2 disagreement votes on the parent comment: is this disagreement over me asking the question(s) (/drawing attention to the fact that it could be true)? Or answering the question(s) in the negative? If the latter, please link to bigger writing prizes.

Nathan YoungSep 26 20229

You should fly anyone who wins over 5k to meet with you in person. They have 1 hour to shift your credences by the same amount the already did (in bayesian terms, not % difference^[1]). If they do, you'll give them the amount of money you already did.

I imagine some people arguing in person will be able to convince you better, both because there will be much greater bandwidth and because it allows for facial expressions and understanding the emotions behind an intellectual position, which are really important.

^{^}
If you move someone from 90% to 99%, the e

... (read more)

philgoetz

Nov 13 2022

That footnote is an important point. People need to learn to use odds ratios. Though I think that with odds ratios, the equivalent increase is to 1 - ((1/99) x ((1/99) / (10/90))) = 99.908%, not the intuitive-looking 99.9%. Also, the interpretation of odds ratios is often counter-intuitive when comparing test groups of different sizes. If P(X) >> P(~X) or P(X) << P(~X), the probability ratio P(W|X) / P(W|~X) can be very different from the odds ratio [P(W,X) / P(W,~X)] / [P(~W,X) / P(~W,~X)]. (Hope I've done that math right. The odds ratio would normally just use counts, but I used probabilities for both to make them more visually comparable.)

Zach Stein-PerlmanSep 23 20229

Are timelines-probabilities in this post conditional on no major endogenous slowdowns (due to major policy interventions on AI, major conflict due to AI, pivotal acts, safety-based disinclination, etc.)?

Nick_BecksteadSep 23 202212

No, they are unconditional.

Greg_Colbourn ⏸️

Sep 24 2022

There is also the feedback loop involving the Future Fund itself. As Michael Dickens points out here: I think it's much easier to argue that p(misalignment x-risk|AGI) >35% (or 75%) as things stand.

JoshuaBlake

Sep 24 2022

What does "as things stand" mean? If we invented AGI tomorrow? That doesn't seem like a useful prediction.

Greg_Colbourn ⏸️

Sep 24 2022

I'm thinking more along the lines of how things are with the current level of progress on AI Alignment and AI Governance, or assuming that the needle doesn't move appreciably on these. In the limit of zero needle movement, this would be equivalent to if AGI was invented tomorrow.

PabloSep 25 20228

This is a very exciting development!

In your third footnote, you write:

It might be argued that [rewarding participants for publishing analyses that move our subjective estimates significantly away from the our current views] makes the prize encourage people to have views different from those presented here. This seems hard to avoid, since we are looking for information that changes our decisions, which requires changing our beliefs.

However, an analysis that reassures you that your current estimates are correct can make your beliefs more resilient, and in tu... (read more)

Nick_Beckstead

Sep 25 2022

Thanks for the feedback! I think this is a reasonable comment, and the main things that prevented us from doing this are: (i) I thought it would detract from the simplicity of the prize competition, and would be hard to communicate clearly and simply (ii) I think the main thing that would make our views more robust is seeing what the best arguments are for having quite different views, and this seems like it is addressed by the competition as it stands.

Yonatan CaleSep 24 20228

Would FTX be interested in opening a platform for safe handling of knowledge that should stay secret?

This is a platform that, to develop, we'd need to be in contact with a "customer" like FTX.

I think it needs more planning than a forum comment, though I endorse raising the subject.

h/t Edo Arad

Yonatan Cale

Sep 24 2022

(The solution might involve writing 0 new lines of code, but only using Signal or something like that, maybe)

zdgroffSep 23 20228

I'm toying with a project to gather reference classes for AGI-induced extinction and AGI takeover. If someone would like to collaborate, please get in touch.

(I'm aware of and giving thought to reference class tennis concerns but still think something like this is neglected.)

MichaelStJulesSep 23 20228

Minor nitpick: You describe your subjective probabilities in terms of fair betting odds, but aren't betting odds misleading/confusing, since if AGI kills everyone, there's no payout? Even loans that are forgiven or paid back depending on the outcome could be confusing, because the value of money could drastically change, although you could try to adjust for that like inflation. I'm not sure such an adjustment would be accurate, though.

Maybe you could talk about betting odds as if you're an observer outside this world or otherwise assume away (causal and acausal) influence other than through the payout. Or just don't use betting odds.

Nick_BecksteadSep 23 202214

Maybe you could talk about betting odds as if you're an observer outside this world or otherwise assume away (causal and acausal) influence other than through the payout.

Yes, the intention is roughly something like this.

David JohnstonSep 25 20227

I’m thinking of writing something for this. Most of the arguments I have in mind address the headline problem only partially. Do you mind if I make a series of, say, 5 posts as a single submission?

Nick_Beckstead

Sep 25 2022

For simplicity on our end, I'd appreciate if you had one post at the end that was the "official" entry, which links to the other posts. That would be OK!

Jay BaileySep 24 20225

Worth noting is that money like this is absolutely capable of shifting people's beliefs through motivated reasoning. Specifically, I might be tempted to argue for a probability outside the Future Fund's threshold, and for research I do to be motivated in favor of updating in this direction. Thus, my strategy would be to figure out your beliefs before looking at the contest, then look at the contest to see if you disagree with the Future Fund.

The questions are:

“P(misalignment x-risk|AGI)”: Conditional on AGI being developed by 2070, humanity will go extinct... (read more)

eliflandOct 17 20224

“P(misalignment x-risk|AGI)”: Conditional on AGI being developed by 2070, humanity will go extinct or drastically curtail its future potential due to loss of control of AGI.

I'm guessing this definition is meant to separate misalignment from misuse, but I'm curious whether you are including either/both of these 2 cases as misalignment x-risk:

AGI is deployed and we get locked into a great outcome by today's standards, but we get a world with <=1% of the value of "humanity's potential". So we sort of have an existential catastrophe, without a discrete cata

... (read more)

Nick_Beckstead

Oct 19 2022

1 - counts for purposes of this question 2 - doesn't count for purposes of this question (but would be a really big deal!)

TruePathSep 24 20224

The "go extinct" condition is a bit fuzzy. It seems like it would be better to express what you want to change your mind about as something more like (forget the term for this). P(go extinct| AGI)/P(go extinct).

I know you've written the question in terms of go extinct because of AGI but I worry this leads to relatively trivial/uninformative about AI ways to shift that value upward.

For instance, consider a line of argument:

AGI is quite likely (probably by your own lights) to be developed by 2070.
If AGI is developed either it will suffer from serious

... (read more)

David Johnston

Sep 24 2022

I agree with this, and the “drastic reduction in long term value” part is even worse. It is implicitly counterfactual - drastic reductions have to be in reference to *something * - but what exactly the proposed counterfactual is is extremely vague. I worry that to some extent this vagueness will lead to people not exploring some answers to the question because they’re trying to self impose a “sensible counterfactual” constraint which, due to vagueness, won’t actually line up well with the kinds of counterfactuals the FTX foundation is interested in exploring.

helmetedhornbillSep 24 20223

Would you be able to say a little more about why part of your criteria seems to be degree of probability shift ("We will award larger prizes for larger changes to these probabilities, as follows..."). It seems to me that you might get a case where you could get analyses that offer larger changes but are less robust than some analyses that suggest smaller changes. I didn't understand how much of your formal evaluation will look at plausibility, argumentation, soundness?

(asking as a curiosity not as a critique)

Nick_Beckstead

Sep 24 2022

Plausibility, argumentation, and soundness will be inputs into how much our subjective probabilities change. We framed this in terms of subjective probabilities because it seemed like the easiest way to crisply point at ideas which could change our prioritization in significant ways.

Nick_BecksteadSep 23 20223

There are some better processes that would be used for some smaller groups of high-trust people competing with each other, but I think we don't really have a good process for this particular use case of:

* Someone wants to publish something
* They are worried it might be an information hazard
* They want someone logical to look at it and assess that before they publish

I think it would be a useful service for someone to solve that problem. I am certainly feeling some pain from it right now, though I'm not sure how general it is. (I would think it's pretty general, especially in biosecurity, and I don't think there are good scalable processes in place right now.)

Fedor

Sep 26 2022

Hey Lorenzo pointed me to this comment. I work in InfoSec. The first step is defining what your threats are, and what are you trying to defend. I'll be blunt, if large, highly capable geopolitical powers actively want to get your highly valuable information, beyond passive bulk collection, then they will be able to get it. I don't quite know how to say this, but security is bad at what we do. If you want to keep something secret they want as much as say nuclear secrets, then we don't know how to do that, so that it will work with a high chance of success. If your information is sensitive, confidential, but nation state actors only want it as much as, say something that would give a press scandal then there is opportunity. If you want to disclose infohazards safely, there's a lot to learn from whistleblower publisher orgs (like wikileaks), and CitizenLab. The cheap, usable, option is for someone to have a otherwise unused phone and create a protonmail and signal with it, and then publish those on any https website (like this forum), and then the info never gets forwarded from the phone. Publish the protonmail PGP key, and make sure people email it from either Protonmail itself or if they understand PGP (so not normal gmail). That gets everything to a device with minimal attack surface, and is reasonably user friendly. If you have problems in this area, I can help.

Lorenzo Buonanno🔸

Sep 24 2022

Probably missing something obvious, but could they either: * PGP encrypt it with the reviewer's public key, and send it via email? * Use an e2e encrypted messaging medium? (Don't know which are trustworthy, but I'm sure there's an expert consensus) Or are those not user friendly enough? I think this is a solved problem in infosec (but am probably missing something)

Yonatan Cale

Sep 24 2022

(+1 to "not user friendly". Signal would be more user friendly, for example)

Lorenzo Buonanno🔸

Sep 24 2022

Protonmail and Signal are e2e encrypted messaging mediums. But depending on how paranoid the users need to be these systems might not provide enough guarantees, since you would need to trust the servers not to MITM. Unless you do some sort of in-person key-exchange. But I'm definitely not an expert. In general I think there are plenty of experts that know exactly how to handle these things and they're pretty easy to contact. Edit: I agree with acylhalide comment, if you have government-level actors this is potentially not enough.

EmrikNov 5 20222

This was gonna be a comment, but it turned into a post about whether large AI forecasting prizes could be suboptimal.

It's worth stating explicitly that an actual working technical solution to the alignment problem with low tax would substantially update the panel's beliefs about proposition A. So this isn't necessarily a contest about forecasting-arguments even if it's (misguidedly, imo) presented as one.
I think this matters because I don't see forecasting as being a very targeted use of time for making the world better. In the worst case, this contest can

... (read more)

Wei DaiOct 15 20222

But two formidable new problems for humanity could also arise

I think there are other AI-related problems that are comparable in seriousness to these two, which you may be neglecting (since you don't mention them here). These posts describe a few of them, and this post tried to comprehensively list my worries about AI x-risk.

Grant FlemingSep 30 20222

Interesting idea for a competition, but I don't think that the contest rules as designed and, more specifically, the information hazard policy, are well thought out for any submissions that follow the below line of argumentation when attempting to make the case for longer timelines:

Scaling current deep learning approaches in both compute and data will not be sufficient to achieve AGI, at least within the timeline specified by the competition
This is due to some critical component missing in the design of current deep neural networks
Supposing that this criti

... (read more)

Geoffrey MillerSep 25 20222

Question about how judges would handle multiple versions of essays for this competition. (I think this contest is a great idea; I'm just trying to anticipate some practical issues that might arise.)

EA Forum has a ethos of people offering ideas, getting feedback and criticism, and updating their ideas iteratively. For purposes of this contest, how would the judges treat essays that are developed in multiple versions?

For example, suppose a researcher posts version 1.0 of an essay on EA Forum with the "Future Fund worldview prize" tag. They get a bunch ... (read more)

Greg_Colbourn ⏸️ Sep 24 20222

Yes, arguably this prize doesn't require any original research (or STEM breakthroughs), so could be won just by convincing argumentation based on existing knowledge. Prizes relating to (non-fiction) writing seem like a more relevant reference class than scientific prizes. And this prize seems correspondingly a lot more accessible (and lower effort to enter) on the face of it.

Noah ScalesSep 24 20222

Hi!

I don't think I will participate in this contest, because:

pursuing AGI is an ethical no-no for me.
I like expert systems technology for what it offers.
I don't have much background knowledge on AGI risk.
I am not comfortable with subjective probability as you use it for forecasting.

However, after reading about this prize, I have several questions that came up for me as I read it. I thought I would offer them as a good-faith effort to clarify your goals here.

There are significant risks to human well-being, aside from human extinction, that are plausible in

... (read more)

Lorenzo Buonanno🔸

Sep 24 2022

What's the difference between extinction risk and existential risk?

Erich_Grunewald 🔸

Sep 24 2022

From the wiki: "An existential risk is the risk of an existential catastrophe, i.e. one that threatens the destruction of humanity’s longterm potential." That can include getting permanently locked into a totalitarian dictatorship and things of that sort, even if they don't result in extinction.

Lorenzo Buonanno🔸

Sep 24 2022

Thank you! And doubly thank you for the topic link. In case others are confused, I found the end of this post particularly clear https://forum.effectivealtruism.org/posts/qFdifovCmckujxEsq/existential-risk-is-badly-named-and-leads-to-narrow-focus-on

QOct 6 20221

A $200k prize for publishing any analysis which we consider the canonical critique of the current position highlighted above on any of the above questions

Question: In this formulation, what is meant by the "current position"? Just asking to be sure.

It could refer to the specific credences outlined above, but it would seem somewhat strange to say (e.g.) "here is what we regard as the canonical critique of 'AGI will be developed by January 1, 2043 =/= 20%'". So I am inclined to believe that it probably means something else.

I would love to know, since I... (read more)

cveresOct 6 20221

How do we tag the post?

The instructions say to tag the post with "Future Fund worldview prize", but it does not seem possible to do this. Only existing tags can be used for tagging as far as I can tell, and this tag is not in the list of options.

Charlie DoughertySep 26 20221

Could you provide a deeper idea of what you mean by "misaligned"?

finnhamblyNov 1 20221

I am unsure what you mean by AGI. You say:

For purposes of our definitions, we’ll count it as AGI being developed if there are AI systems that power a comparably profound transformation (in economic terms or otherwise) as would be achieved in such a world [where cheap AI systems are fully substitutable for human labor].

and:

causing human extinction or drastically limiting humanity’s future potential may not show up as rapid GDP growth, but automatically counts for the purposes of this definition.

If someone uses AI capabilities to create a synthetic virus (wh... (read more)

Nick_Beckstead

Nov 1 2022

Thanks, I think this is subtle and I don't think I expressed this perfectly. > If someone uses AI capabilities to create a synthetic virus (which they wouldn't have been able to do in the counterfactual world without that AI-generated capability) and caused the extinction or drastic curtailment of humanity, would that count as "AGI being developed"? No, I would not count this. I'd probably count it if the AI a) somehow formed the intention to do this and then developed the pathogen and released it without human direction, but b) couldn't yet produce as much economic output as full automation of labor.

finnhambly

Nov 1 2022

Okay great, that makes sense to me. Thank you very much for the clarification!

Charles RathkopfOct 24 20221

Are essays submitted before December 23rd at an advantage over essays submitted on December 23rd?

Nick_Beckstead

Oct 25 2022

No official rules on that. I do think that if you have some back and forth in the comments that's a way to make your case more convincing, so some edge there.

paul_dfrOct 17 20221

Is it permitted to submit more than one entry if the entries are on different topics?

(Apologies if this has been answered somewhere already.)

Nick_Beckstead

Oct 19 2022

Yes

JMETSRSep 27 20220

How do we submit our essay for the contest? Is there an email we send it to or something?

Ross Rheingans-Yoo🔸

Oct 5 2022

cveres

Oct 6 2022

But it is not possible to tag the post with "Future Fund worldview prize". It seems to me that only existing tags can be used.

Lizka

Oct 6 2022

I think there was a tag, but it might have gotten deleted. I made a new one — you should be able to use it now.

CatneeSep 24 20220

I think it would be nicer if you say your P(Doom|AGI in 2070) instead of P(Doom|AGI by 2070), because the second one implicitly takes into account your timelines. Also, it would be nicer to have the same years: P(Doom | AGI in 2043) and P(Doom | AGI in 2100)

Ross Rheingans-Yoo🔸

Oct 5 2022

I disagree. (At least, if defining "nicer" as "more useful to the stated goals for the prizes".) As an interested observer, I think it's an advantage to take timelines into account. Specifically, I think the most compelling way to argue for a particular P(Catastrophe|AGI by 20__) to the FF prize evaluators will be: * states and argues for a timelines distribution in terms of P(AGI in 20__) for a continuous range of 20__s * states and argues for a conditional-catastrophe function in terms of P(Catastrophe|AGI in 20__) over the range * integrates the product over the range to get a P(Catastrophe|AGI by 20__) * argues that the final number isn't excessively sensitive to small shifts in the timelines distribution or the catastrophe-conditional-on-year function. An argument which does all of this successfully is significantly more useful to informing the FF's actions than an argument which only defends a single P(Catastrophe|20__). I do agree that it would be nice to have the years line up, but as above I do expect a winning argument for P(Catastrophe|AGI by 2070) to more-or-less explicitly inform a P(Catastrophe|AGI by 2043), so I don't expect a huge loss. (Not speaking for the prizes organizers/evaluators, just for myself.)