AI safety
AI safety
Studying and reducing the existential risks posed by advanced artificial intelligence

Quick takes

80
2mo
1
I recently created a simple workflow to allow people to write to the Attorneys General of California and Delaware to share thoughts + encourage scrutiny of the upcoming OpenAI nonprofit conversion attempt. Write a letter to the CA and DE Attorneys General I think this might be a high-leverage opportunity for outreach. Both AG offices have already begun investigations, and AGs are elected officials who are primarily tasked with protecting the public interest, so they should care what the public thinks and prioritizes. Unlike e.g. congresspeople, I don't AGs often receive grassroots outreach (I found ~0 examples of this in the past), and an influx of polite and thoughtful letters may have some influence — especially from CA and DE residents, although I think anyone impacted by their decision should feel comfortable contacting them. Personally I don't expect the conversion to be blocked, but I do think the value and nature of the eventual deal might be significantly influenced by the degree of scrutiny on the transaction. Please consider writing a short letter — even a few sentences is fine. Our partner handles the actual delivery, so all you need to do is submit the form. If you want to write one on your own and can't find contact info, feel free to dm me.
64
2mo
5
Notes on some of my AI-related confusions[1] It’s hard for me to get a sense for stuff like “how quickly are we moving towards the kind of AI that I’m really worried about?” I think this stems partly from (1) a conflation of different types of “crazy powerful AI”, and (2) the way that benchmarks and other measures of “AI progress” de-couple from actual progress towards the relevant things. Trying to represent these things graphically helps me orient/think.  First, it seems useful to distinguish the breadth or generality of state-of-the-art AI models and how able they are on some relevant capabilities. Once I separate these out, I can plot roughly where some definitions of "crazy powerful AI" apparently lie on these axes:  (I think there are too many definitions of "AGI" at this point. Many people would make that area much narrower, but possibly in different ways.) Visualizing things this way also makes it easier for me[2] to ask: Where do various threat models kick in? Where do we get “transformative” effects? (Where does “TAI” lie?) Another question that I keep thinking about is something like: “what are key narrow (sets of) capabilities such that the risks from models grow ~linearly as they improve on those capabilities?” Or maybe “What is the narrowest set of capabilities for which we capture basically all the relevant info by turning the axes above into something like ‘average ability on that set’ and ‘coverage of those abilities’, and then plotting how risk changes as we move the frontier?” The most plausible sets of abilities like this might be something like:  * Everything necessary for AI R&D[3] * Long-horizon planning and technical skills? If I try the former, how does risk from different AI systems change?  And we could try drawing some curves that represent our  guesses about how the risk changes as we make progress on a narrow set of AI capabilities on the x-axis. This is very hard; I worry that companies focus on benchmarks in ways that
61
2mo
4
Holden Karnofsky has joined Anthropic (LinkedIn profile). I haven't been able to find more information.
4
2d
A lot of post-AGI predictions are more like 1920s predicting flying cars (technically feasible, maximally desirable if no other constraints, same thing as current system but better) instead of predicting EasyJet: crammed low-cost airlines (physical constraints imposing economic constraints, shaped by iterative regulation, different from current system)
23
1mo
10
The U.S. State Department will reportedly use AI tools to trawl social media accounts, in order to detect pro-Hamas sentiment to be used as grounds for visa revocations (per Axios). Regardless of your views on the matter, regardless of whether you trust the same government that at best had a 40% hit rate on ‘woke science’ to do this: They are clearly charging ahead on this stuff. The kind of thoughtful consideration of the risks that we’d like is clearly not happening here. So why would we expect it to happen when it comes to existential risks, or a capability race with a foreign power?
31
1mo
If you can get a better score than our human subjects did on any of METR's RE-Bench evals, send it to me and we will fly you out for an onsite interview Caveats: 1. you're employable (we can sponsor visas from most but not all countries) 2. use same hardware 3. honor system that you didn't take more time than our human subjects (8 hours). If you take more still send it to me and we probably will still be interested in talking (Crossposted from twitter.)
43
2mo
2
Both Sam and Dario saying that they now believe they know how to build AGI seems like an underrated development to me. To my knowledge, they only started saying this recently. I suspect they are overconfident, but still seems like a more significant indicator than many people seem to be tracking.
27
2mo
It's the first official day of the AI Safety Action Summit, and thus it's also the day that the Seoul Commitments (made by sixteen companies last year to adopt an RSP/safety framework) have come due. I've made a tracker/report card for each of these policies at www.seoul-tracker.org. I'll plan to keep this updated for the foreseeable future as policies get released/modified. Don't take the grades too seriously — think of it as one opinionated take on the quality of the commitments as written, and in cases where there is evidence, implemented. Do feel free to share feedback if anything you see surprises you, or if you think the report card misses something important. My personal takeaway is that both compliance and quality for these policies are much worse than I would have hoped. I believe many peoples' theories of change for these policies gesture at something about a race to the top, where companies are eager to outcompete each other on safety to win talent and public trust, but I don't sense much urgency or rigor here. Another theory of change is that this is a sort of laboratory for future regulation, where companies can experiment now with safety practices and the best ones could be codified. But most of the diversity between policies here is in how vague they can be while claiming to manage risks :/ I'm really hoping this changes as AGI gets closer and companies feel they need to do more to prove to govts/public that they can be trusted. Part of my hope is that this report card makes clear to outsiders that not all voluntary safety frameworks are equally credible.
Load more (8/164)