Hey! This is another update from the distillers at the AI Safety Info website (and its more playful clone Stampy).
Here are a couple of the answers that we wrote up over the last month (May 2023). As always let us know in the comments if there are any questions that you guys have that you would like to see answered.
The list below redirects to individual links, and the collective URL above renders all of the answers in the list on one page at once.
How can I help tree
There are many different types of individuals from all manner of different backgrounds who want to contribute to AI Safety. So we added a FAQ with advice on various different ways to help with AI safety.
Here is the unified link for all the answers in the how can I help tree if you wish to navigate to aisafety.info directly. See Steven's post for the full list of individual questions and details.
Other Distillations
- Why would a misaligned superintelligence kill everyone in the world?
- Aren't AI existential risk concerns just an example of Pascal's mugging?
- What is Redwood Research's strategy?
- What is imitation learning?
- What is behavioral cloning?
- What is "coherent extrapolated volition"?
- What is compute?
- What is the "Bitter Lesson"?
- What is adversarial training?
- How is red teaming used in AI alignment?
- Isn't the real concern autonomous weapons?
- Isn’t it immoral to control and impose our values on AI?
- Isn't the real concern bias?
- Isn't the real concern AI being misused by terrorists or other bad actors?
- Would a slowdown in AI capabilities development decrease existential risk?
- What is a singleton?
- What is mindcrime?
- Are AIs conscious?
- What might an international treaty on the development of AGI look like?
- What are polysemantic neurons?
Developer Updates
There are also a couple of cool new features that the developers of the website worked on. We now have a tags page, which allows the reader to navigate the answers based on the tag that they are most interested in. There is also an implementation of a glossary function which auto-creates a short definition popup whenever a jargony term pops up in the answer that the reader might not be familiar with. Here is a sample answer that shows how this functionality works. Try hovering over the "agent", "s-risk", "Goodhart" etc... links.
This is all still a work in progress, but it is exciting to share everything that the people who are all part of the Stampy project have been working on.
Cross-posted to Lesswrong: https://www.lesswrong.com/posts/fFtkQCasMjeEpMdhp/stampy-s-ai-safety-info-new-distillations-3-may-2023
I always get confused about the difference between superposition and poly-semanticy. Would be great if the article clarified this.
It's always hard finding a balance between brevity + de-jargonification in the answers while also touching on every topic that people might be confused about. Mainly because we expect a large diversity of people with varying technical and non-technical backgrounds to interact with the answers. This means that we try and minimize the information per answer to the main essentials required for understanding the topic.
All that being said, I also don't really know if there is a major difference between polysemanticity and superposition. Additionally, I am also confused about if polysemantic and monosemantic neurons refer to the same underlying concept as disentangled vs distributed representations because all these concepts sound like they are describing the same thing. I took note of your comment in the thread about that particular answer, and will get back to it when I learn more.
If you have any resources/posts/papers that point out that they are indeed different let me know and I'll write up a new answer. Something like - What is the difference between polysemantic neurons and superposition?, and, What is the difference between solving polysemanticity in neurons and disentangling latent space representations?
Alternatively, if you come by some awesome explanation and feel like writing it up, you could just edit the stampy docs on superposition or polysemanticity yourself.