$20K in Bounties for AI Safety Public Materials

TW123; Dan H; Oliver Z

TLDR

We are announcing a $20k bounty for publicly-understandable explainers of AI safety concepts. We are also releasing the results of the AI Safety Arguments competition.

Background

Of the technologists, ML researchers, and policymakers thinking about AI, very few are seriously thinking about AI existential safety. This results in less high-quality research and could also pose difficulties for deployment of safety solutions in the future.

There is no single solution to this problem. However, an increase in the number of publicly accessible discussions of AI risk can help to shift the Overton window towards more serious consideration of AI safety.

Capability advancements have surprised many in the broader ML community: as they have made discussion of AGI more possible, they can also contribute to making discussion of existential safety more possible. Still, there are not many good introductory resources to the topic or various subtopics. If somebody has no background, they might need to read books or very long sequences of posts to get an idea about why people are worried about AI x-risk. There are a few strong, short, introductions to AI x-risk, but some of them are out of date and they aren’t suited for all audiences.

Shane Legg, a co-founder of DeepMind, recently said the following about AGI:^[1]

If you go back 10-12 years ago the whole notion of Artificial General Intelligence was lunatic fringe. People [in the field] would literally just roll their eyes and just walk away. [I had that happen] multiple times. [...] [But] every year [the number of people who roll their eyes] becomes less.

We hope that the number of people rolling their eyes at AI safety can be reduced, too. In the case of AGI, increased AI capabilities and public relations efforts by major AI labs have fed more discussion. Similarly, conscious efforts to increase public understanding and knowledge of safety could have a similar effect.

Bounty details

The Center for AI Safety is announcing a $20,000 bounty for the best publicly-understandable explainers of topics in AI safety. Winners of the bounty will win $2,000 each, for a total of up to ten possible bounty recipients. The bounty is subject to the Terms and Conditions below.

By publicly understandable, we mean understandable to somebody who has never read a book or technical paper on AI safety and who has never read LessWrong or the EA Forum. Work may or may not assume technical knowledge of deep learning and related math, but should make minimal assumptions beyond that.

By explainer, we mean that it digests existing research and ideas into a coherent and comprehensible piece of writing. This means that the work should draw from multiple sources. This is not a bounty for original research, and is intended for work that covers more ground at a higher level than the distillation contest.

Below are some examples of public materials that we value. This should not be taken as an exhaustive list of all existing valuable public contributions.

AI risk executive summary (2014)
Concrete Problems in AI Safety (2016)
Robert Miles’ YouTube channel (2017-present)
AGI Safety From First Principles (2020)
The case for taking AI risk seriously as a threat to humanity (2020)
Unsolved Problems in ML Safety (2021)
X-risk Analysis for AI Research (2022)

Note that many of the works above are quite different and do not always agree with each other. Listing them isn’t to say that we agree with everything in them, and we don’t expect to necessarily agree with all claims in the pieces we award bounties to. However, we will not award bounties to work we believe is false or misleading.

Here are some categories of work we believe could be valuable:

Executive summaries that lay out a case for the overall importance of AI safety.
Work that explains considerations around a particular area in AI safety, summarizes existing work in the area, and discusses its relative importance. We are especially interested in writing regarding the topics below. More discussion and explanation of each can be found here.
- Deception and deceptive alignment
- Power-seeking behavior
- Emergent goals, intrasystem goals, mesa-optimization
- Weaponization of AI
- The “enfeeblement problem”
- Eroded epistemics caused by persuasive AI
- Proxy misspecification
- Value lock-in
Any other explainer that presents something relevant to large scale and existential risk from AI that provides a valuable perspective and is publicly understandable.

There is no particular length of submission we are seeking, but we expect most winning submissions will take less than 30 minutes for a reader/viewer/listener to digest.

Award Process

Judging will be conducted on a rolling basis, and we may award bounties at any time. Judging is at the discretion of the Center for AI Safety. Winners of the bounty are required to allow for their work to be reprinted with attribution to the author but not necessarily with a link to the original post.

How to submit

We will accept several kinds of submissions:

New published work (originally published after August 4th, 2022). Could be published as a paper in an academic venue, a blog post, etc.
Posts on the EA Forum, Alignment Forum, and LessWrong tagged with the AI Safety Public Materials tag.
Other forms of media (for example, YouTube videos, podcasts, visual art, infographics) are also accepted.
Referral links to public materials previously published that weren’t already on our radar. Referrers of a winning entry will be given nominal prizes.

If your submission is released somewhere other than the forums above, you may submit a link to it here.

The competition will run from today, August 4th, 2022 until December 31st, 2022. The bounty will be awarded on a rolling basis. If funds run out before the end date, we will edit this post to indicate that and also notify everyone who filled out the interest form below.

If you are interested in potentially writing something for this bounty, please fill out this interest form! We may connect you with others interested in working on similar things.

AI Safety Arguments Competition Results

In our previously-announced AI Safety arguments competition, we aimed to compile short arguments for the importance of AI safety. The main intention of the competition was to compile a collection of points to riff on in other work.

We received over 800 submissions, and in this post we are releasing the top ~10% here. These submissions were selected through an effort by 29 volunteers followed by manual review and fact checking by our team, and prizes will be distributed amongst them in varying proportions.^[2] Many of the submissions were drawn from previously existing work. The spreadsheet format is inspired by Victoria Krakovna’s very useful spreadsheet of examples of specification gaming.

We would like to note that it is important to be mindful of potential negative risks when doing any kind of broader outreach, and it’s especially important that those doing outreach are familiar with the audiences they plan to interact with. If you are thinking of using the arguments for public outreach, please consider reaching out to us beforehand. You can contact us at info@centerforaisafety.org.

We hope that the arguments we have identified can serve as a useful compilation of common points of public outreach for AI safety, and can be used in a wide variety of work including the kind of work we are seeking in the competition above.

Terms and Conditions for Bounty

Employees or current contractors of Center for AI Safety and contest organizers are not eligible to win prizes.
Entrants and Winners must be over the age of 18.
By entering the contest, entrants agree to the Terms & Conditions.
All taxes are the responsibility of the winners.
The legality of accepting the prize in his or her country is the responsibility of the winners. Sponsor may confirm the legality of sending prize money to winners who are residents of countries outside of the United States.
Winners will be notified by email (for submissions through the form) or direct message on the forum (for EA Forum, Alignment Forum, and LessWrong submissions).
Winners grant to Sponsor the right to use their name and likeness for any purpose arising out of or related to the contest. Winners also grant to Sponsor a non-exclusive royalty-free license to reprint, publish and/or use the entry for any purpose arising out of related to the contest including linking to or re-publishing the work.
Entrants warrant that they are eligible to receive the prize money from any relevant employer or from a contract standpoint.
Entrants agree that the Center for AI Safety shall not be liable to entrants for any type of damages that arise out of or are related to the contest and/or the prizes.
By submitting an entry, entrant represents and warrants that, consistent with the terms of the Terms and Conditions: (a) the entry is entrant’s original work; (b) entrant owns any copyright applicable to the entry; (c) the entry does not violate, in whole or in part, any existing copyright, trademark, patent or any other intellectual property right of any other person, organization or entity; (d) entrant has confirmed and is unaware of any contractual obligations entrant has which may be inconsistent with these Terms and Conditions and the rights entrant is required to have in the entry, including but not limited to any prohibitions, obligations or limitations arising from any current or former employment arrangement entrant may have; (e) entrant is not disclosing the confidential, trade secret or proprietary information of any other person or entity, including any obligation entrant may have in connection arising from any current or former employment, without authorization or a license; and (f) entrant has full power and all legal rights to submit an entry in full compliance with these Terms and Conditions.

^{^}
This was part of one of the winning submissions in the AI safety arguments competition, detailed below.
^{^}
All winners have been notified, if you haven’t received notice yet via email or LessWrong/EA Forum message, then that unfortunately means you did not win.

JakubKAug 6 20228

This is a great idea! For the future, maybe make the bounty a little higher? Writing a super amazing post for this contest could take >100 hours, and the contest offers a possibility of getting paid <$20 per hour for that. I suspect there could be some 80/20 rule at work for the impact of these posts, so super amazing posts are especially important to incentivize.

Amber DawnAug 5 20226

I'm interested in collaborating on this with someone who knows a lot about AI safety, but doesn't have the time, ability or inclination to write a public-facing explainer - you could explain a topic to me over a call or calls, and I could write it up. I'm very much not an expert on AI safety, but in some ways that might be good for something like this - I don't have the curse of knowledge so I'll have a better sense of what people who are new to the subject will or will not understand.

Algo_LawAug 6 20222

The audience section is a wide scope which is useful, but just to confirm - would doing an explainer for AI Safety topics/issues affecting other areas with those as the audience count?

Eg. Producing content on AI Safety risks in economics for economists? Or producing content on AI Safety topics in medicine for doctors?

aogAug 5 20222

This looks really cool, thanks for sharing. Would you be able to say more about who the audience is, and how you'll publicize this writing? The venue of publication seems like one of the more important factors in determining the impact of the writing, and different venues call for different writing styles. For example, I'd write very different pieces for a Vox explainer, a Brookings report, a published paper, or a PDF attached to an email. Where do you plan to publish by default? And do you think it would be worthwhile to identify write for a specific venue, perhaps by working with relevant coauthors?

TW123Aug 6 20224

We don't expect the work to be published anywhere when it's submitted.

For certain pieces, we may work with authors to publish them somewhere, publish them on our website, or adapt them and publish an adapted version somewhere. But this is not guaranteed.

In general, we expect that the best pieces will be generally suited for an audience of either smart people who don't know about ML, or ML researchers. Though there is a lot of room for pieces that are more optimized for particular audiences and venues, we think that more general pieces would serve as great inspiration for those later pieces.

[anonymous]Aug 5 20223

From the bullet list above, it sounds like the author will be the one responsible for publishing and publicising the work.

CaroNov 30 20221

Very excited about this competition! Is it still happening?

Oliver ZDec 1 20221

Yup! The bounty is still ongoing. We have been awarding prizes throughout the duration of the bounty and will post an update in January detailing the results.

YitzAug 5 20221

Question—is $20,000 awarded to every entry which qualifies under the rules, or is there one winner selected among the pool of all who submit an entry?

TW123Aug 5 20226

I edited the title to say "$20k in bounties" to make it more clear.

From the original text:

Winners of the bounty will win $2,000 each, for a total of up to ten possible bounty recipients.

This doesn't mean each person who submits an entry gets $2,000. We will award this to entries that meet a high bar for quality (roughly, material that we would actually be interested in using for outreach).

Thanks for the clarification! I might try to do something on the Orthogonality thesis if I get the chance, since I think that tends to be glossed over in a lot of popular introductions.

Effective Altruism Forum
EA Forum