This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
Effective Altruism Forum
Topics
EA Forum
Login
Sign up
AI interpretability
•
Applied to
What should I read about defining AI "hallucination?"
11d
ago
•
Applied to
Worries about latent reasoning in LLMs
14d
ago
•
Applied to
Implications of the inference scaling paradigm for AI safety
19d
ago
•
Applied to
What are polysemantic neurons?
1mo
ago
•
Applied to
Beyond Meta: Large Concept Models Will Win
1mo
ago
•
Applied to
Takes on "Alignment Faking in Large Language Models"
2mo
ago
•
Applied to
LLM chatbots have ~half of the kinds of "consciousness" that humans believe in. Humans should avoid going crazy about that.
2mo
ago
•
Applied to
Motivation control
3mo
ago
•
Applied to
A Rocket–Interpretability Analogy
3mo
ago
•
Applied to
5 ways to improve CoT faithfulness
4mo
ago
•
Applied to
Solving adversarial attacks in computer vision as a baby version of general AI alignment
5mo
ago
•
Applied to
AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0
7mo
ago
•
Applied to
Rational Animations' intro to mechanistic interpretability
8mo
ago
•
Applied to
ML4Good Brasil - Applications Open
9mo
ago
•
Applied to
A Selection of Randomly Selected SAE Features
10mo
ago
•
Applied to
AI alignment as a translation problem
1y
ago
•
Applied to
ML4Good UK - Applications Open
1y
ago
•
Applied to
Assessment of AI safety agendas: think about the downside risk
1y
ago
•
Applied to
Public Call for Interest in Mathematical Alignment
1y
ago