Supervised Program for Alignment Research (SPAR) at UC Berkeley: Spring 2023 summary

mic; Dylan Xu; caroq

This is a linkpost for https://www.lesswrong.com/posts/PXr38b64ECtFcn4Yq/supervised-program-for-alignment-research-spar-at-uc

In Spring 2023, the Berkeley AI Safety Initiative for Students (BASIS) organized an alignment research program for students, drawing inspiration from similar programs by Stanford AI Alignment^[1] and OxAI Safety Hub. We brought together 12 researchers from organizations like CHAI, FAR AI, Redwood Research, and Anthropic, and 38 research participants from UC Berkeley and beyond.

Here is the link to SPAR’s website, which includes all of the details about the program. We’ll be running the program again in the Fall 2023 semester as an intercollegiate program, coordinating with a number of local groups and researchers from across the globe.

If you are interested in supervising an AI safety project in Fall 2023, learn more here and fill out our project proposal form, ideally by August 25. Applications for participants will be released in the coming weeks.

Motivation

Since a primary goal of university alignment organizations is to produce counterfactual alignment researchers, there seems to be great value in encouraging university students to conduct research in AI safety, both for object-level contributions and as an opportunity to gain experience and test fit. While programs like AI Safety Fundamentals, representing the top of a “funnel” of engagement in the alignment community, have been widely adopted as a template for the introductory outreach of university groups, we do not think there are similarly ubiquitous options for engaged, technically impressive students interested in alignment to further their involvement productively. Research is not the only feasible way to do this, but it holds various advantages: many of the strongest students are more interested in research than other types of programs that might introduce them to AI safety, projects have the potential to produce object-level results, and research project results provide signal among participants of potential for future alignment research.

Many alignment university groups have run research programs on a smaller scale and have generally reported bottlenecks such as lack of organizer capacity and difficulty attaining mentorship and oversight on projects; we believe an intercollegiate and centralized-administration model can alleviate these problems.

Additionally, we believe that many talented potential mentors with “implementation-ready” project ideas would benefit from a streamlined opportunity to direct a team of students on such projects. If our application process was sufficiently able to select for capable students, and if its administrators are given the resources to aid mentors in project management, we think that this program could represent a scalable model for making such projects happen counterfactually.

While programs like SERI MATS maintain a very high bar for mentors, with streams usually headed by well-established alignment researchers, we believe that graduate students and some SERI MATS scholars would be good fits as SPAR mentors if they have exciting project ideas and are willing to provide guidance to teams of undergrads. Further, since SPAR gives mentors complete freedom over the number of mentees, the interview process, and the ultimately selectivity of their students, the program may also be desirable to more senior mentors. An intercollegiate pool of applicants will hopefully raise the bar of applicants and allow mentors to set ambitious application criteria for potential mentees.

Research projects

Each project was advised by a researcher in the field of AI safety. In total, we had about a dozen research projects in Spring 2023:

Supervisor	Project Title
Erdem Bıyık and Vivek Myers, UC Berkeley / CHAI	Inferring Objectives in Multi-Agent Simultaneous-Action Systems
Erik Jenner, UC Berkeley / CHAI	Literature Review on Abstractions of Computations
Joe Benton, Redwood Research	Disentangling representations of sparse features in neural networks
Nora Belrose, FAR AI (now at EleutherAI)	Exhaustively Eliciting Truthlike Features in Language Models
Juan Rocamonde, FAR AI	Using Natural Language Instructions to Safely Steer RL Agents
Kellin Pelrine, FAR AI	Detecting and Correcting for Misinformation in Large Datasets
Zac Hatfield-Dodds, Anthropic	Open-source software engineering projects (to help students develop skills for research engineering)
Walter Laurito, FZI / SERI MATS	Consistent Representations of Truth by Contrast-Consistent Search (CCS)
Leon Lang, University of Amsterdam / SERI MATS	RL Agents Evading Learned Shutdownability
Marius Hobbhahn, International Max Planck Research School / SERI MATS (now at Apollo Research)	Playing the auditing game on small toy models (trojans/backdoor detection)
Asa Cooper Stickland, University of Edinburgh / SERI MATS	Understanding to what extent language models “know what they don't know”

You can learn more about the program on our website: https://berkeleyaisafety.com/spar

Here is an incomplete list of some of the public writeups from the program:

Pelrine, Kellin et al. “Towards Reliable Misinformation Mitigation: Generalization, Uncertainty, and GPT-4,” May 2023. Accepted to the ACL 2023 Student Research Workshop.
Lermen, Simon, Teun van der Weij, and Leon Lang. “Evaluating Language Model Behaviors for Shutdown Avoidance in Textual Scenarios,” May 2023.
Jenner, Erik et al. “A comparison of causal scrubbing, causal abstractions, and related methods,” June 2023.

Operational logistics

This section might be of the most interest to people interested in organizing similar programs; feel free to skip this part if it’s not relevant to you.

A few weeks before the start of the semester, we reached out to a variety of AI safety researchers based in the Berkeley/SF area. In all, 12 researchers submitted project proposals. We also asked researchers about their desired qualifications for applicants. In general, most projects required strong experience with deep learning or reinforcement learning.
We publicized the application to UC Berkeley students within the first week of school. It was due on January 25, providing students approximately a week to complete the first-round application. (For context, this is a typical deadline for tech club applications at UC Berkeley.)
We also created a variant application for the broader AI safety community, not just students at UC Berkeley, which opened us up to a wider talent pool. The non-UCB application was due on January 25. SPAR mentors received and viewed UC Berkeley applicants before non-Berkeley ones, which provided the former group an advantage.
- For future rounds, we plan to have a fully inter-collegiate process and equal deadlines for Berkeley and non-Berkeley applicants.
We were able to offer research credits to UC Berkeley participants through our faculty advisor, Stuart Russell.
We didn't want to limit ourselves to participants who already learned about AI safety because we only started our reading group in Fall 2022. For participants that did not previously learn about AI safety in a level of depth analogous to AI Safety Fundamentals (AISF), we required them to enroll in our AI safety DeCal (student-led course).
We received 34 applicants to SPAR from UC Berkeley and 62 external applicants, of which mentors accepted 17 participants from UC Berkeley and 21 external participants.
- Since project descriptions were clear about expected qualifications, the applicant pool seemed fairly strong.
We gave mentors considerable freedom in selecting applicants to their project, rather than assigning groups. Many chose to personally interview applicants, after reviewing their application responses.
In general, successful applicants tended to have good research experience in machine learning. We believe pairing SPAR with our Decal led our club members to be much technically stronger than otherwise.
Considerable organizer time was spent on communicating between applicants and mentors via email. In the future, we hope to streamline this process.
We (as BASIS organizers) didn't have to spend much time overseeing projects during the middle of the semester. This contrasts with the model of OxAI Safety Labs, where organizers took a more active role in assigning project tasks.
- Unfortunately, this also meant that we had less ability to proactively monitor which projects were going off-track. In the future, we would want to stay more informed about how projects are going and help with course-correction where useful.
Aside from SPAR and the student-led class, we also organized a weekend retreat in Berkeley with Stanford AI Alignment, in which we invited AI safety researchers to give talks and offer Q&As for students.
At the end of the semester, we concluded with a series of project presentations.

Room for improvement

We note a few ways our program operations the past semester were suboptimal:

Failures to delegate: The bulk of the work fell onto one organizer due to time-sensitive communications and failures to delegate.
Planning for the program too late in advance: We began preparing for this program very close to the start of the semester (~1 month in advance). (One organizer also anticipated 3-5 projects and was not prepared for how large the program would be!)
- This left minimal time to advertise the program. Anecdotally, one organizer visited another CS club’s social event and talked to a few students who thought it was neat but didn’t consider it due to other time commitments in place.
Planning fallacy and lack of foresight in planning: We didn’t concretely plan through each step of the application process, which led to planning inefficiencies.
Lack of funding: Due to short program timelines and learning that similar student programs were not able to secure funding, we decided not to apply for funding for the program. This meant that we weren’t able to immediately reimburse compute usage, for example.

Conclusion

Overall, although we faced some challenges running this program for the first time, we are excited about the potential here and are looking to scale up in future semesters. We are also coordinating with the AI safety clubs at Georgia Tech and Stanford to organize our next round of SPAR.

If you would like to supervise a research project, learn more about the Fall 2023 program and complete our project proposal form by August 25.

Feel free to contact us at aisafetyberkeley@gmail.com if you have any questions.

^{^}
Special thanks to Gabe Mukobi and Aaron Scher for sharing a number of invaluable resources from Stanford AI Alignment’s Supervised Program in Alignment Research, which we drew heavily from, not least the program name.

Effective Altruism Forum
EA Forum