At the end of 2022, following the success of the 2021 MIRI Conversations, Conjecture started a project to host discussions about AGI and alignment with key people in the field. The goal was simple: surface positions and disagreements, identify cruxes, and make these debates public whenever possible for collective benefit.
Given that people and organizations will have to coordinate to best navigate AI's increasing effects, this is the first, minimum-viable coordination step needed to start from. Coordination is impossible without at least common knowledge of various relevant actors' positions and models.
People sharing their beliefs, discussing them and making as much as possible of that public is strongly positive for a series of reasons.
First, beliefs expressed in public discussions count as micro-commitments or micro-predictions, and help keep the field honest and truth-seeking. When things are only discussed privately, humans tend to weasel around and take inconsistent positions over time, be it intentionally or involuntarily.
Second, commenters help debates progress faster by pointing out mistakes.
Third, public debates compound. Knowledge shared publicly leads to the next generation of arguments being more refined, and progress in public discourse.
We circulated a document about the project to various groups in the field, and invited people from OpenAI, DeepMind, Anthropic, Open Philanthropy, FTX Future Fund, ARC, and MIRI, as well as some independent researchers to participate in the discussions. We prioritized speaking to people at AGI labs, given that they are focused on building AGI capabilities.
The format of discussions was as follows:
- A brief initial exchange with the participants to decide on the topics of discussion. By default, the discussion topic was “How hard is Alignment?”, since we've found we disagree with most people about this, and the reasons for it touch on many core cruxes about AI.
- We held the discussion synchronously for ~120 minutes, in writing, each on a dedicated, private Slack channel.
- We involved a moderator when possible. The moderator's role was to help participants identify and address their cruxes, move the conversation forward, and summarize points of contention.
- We planned to publish cleaned up versions of the transcripts and summaries to Astral Codex Ten, LessWrong, and the EA Forum. Participants were given the opportunity to clarify positions and redact information they considered infohazards or PR risks, as well as veto publishing altogether. We included this clause specifically to address the concerns expressed by people at AI labs, who expected heavy scrutiny by leadership and communications teams on what they can state publicly.
People from ARC, DeepMind, and OpenAI, as well as one independent researcher agreed to participate. The two discussions with Paul Christiano and John Wentworth will be published shortly. One discussion with a person working at DeepMind is pending approval before publication. After a discussion with an OpenAI researcher took place, OpenAI strongly recommended to its employee to not publish, so we will not be publishing that discussion.
Most people we were in touch with were very interested in participating. However, after checking with their own organizations, many returned saying their organizations would not approve them sharing their positions publicly.
This was in spite of the extensive provisions we made to reduce downsides for them: making it possible to edit the transcript, veto publishing, strict comment moderation, and so on. We think organizations discouraging their employees from speaking openly about their views on AI risk is harmful, and we want to encourage more openness.
We are pausing the project for now, and we have mixed feelings about it. It cost a lot of time to organize and conduct, and we were disappointed to see resistance to having and publishing discussions. On the other hand, the participants and moderators did find them enjoyable and valuable. We expect that even the few discussions that we'll be able to publish will improve public discourse and understanding of cruxes.
We believe Conjecture's status at the time in the AI alignment field was not sufficient to get enough traction, but we encourage any high-status person to try and launch similar initiatives.
We'll be interested in running discussions like these again in the future if there's renewed interest, and we appreciate everyone involved in this round.
Ideally there would be an exceedingly high bar for strategic witholding of worldviews. I'd love some mechanism for sending downvotes to the orgs that veto'd their staff from participating! I'd love some way of socially pressuring these orgs into at least trying to convince us that they had really good reasons.
I'm pretty cynical: I assume nervous and uncalibrated shuffling of HR or legal counsel is more likely than actual defense against hazardous leakage of, say, capabilities hints.