The ARC performance is a huge update for me.
I've previously found Francois Chollet's arguments that LLMs are unlikely to scale to AGI pretty convincing. Mainly because he had created an until-now unbeaten benchmark to back those arguments up.
But reading his linked write-up, he describes this as "not merely an incremental improvement, but a genuine breakthrough". He does not admit he was wrong, but instead paints o3 as something fundamentally different to previous LLM-based AIs, which for the purpose of assessing the significance of o3, amounts to the same thing!
I think the presentation of this argument here misses some important considerations:
The way that you want us to act with respect to OP is already the way that OP is trying to act with respect to the rest of the world
EAs don't fund the most important causes, based purely on scale (otherwise tonnes of things EAs ignore would score highly, e.g. vaccination programs in rich countries). A core part of EA is looking for causes which are neglected. We look for the areas that are receiving the least funding relative to what they would receive in our ideal world, because these are likely to be the areas where our donations will have the highest marginal impact.
This is the reply to people who argue "oh you want local charities to disappear and to send all the money to malaria nets". The reply is: "No! In my ideal world, malaria nets would quickly attract all the funding they need. Then there would still be plenty of money left over for other things. But I think I should look at the world I actually live in, recognize that malaria nets are outrageously underfunded, and give all my resources there."
So in a sense, the argument you are making here isn't anything new. You are just saying we should try to act towards other EAs in a similar way to how EAs as a group act towards the rest of the world. And I don't disagree with this. But I think we should go all the way. I think we should treat other EAs in the same way that we treat the rest of the world. If I understand your argument correctly, you are trying to draw a distinction between the EA community and everyone else.
The same considerations that lead OP to choose not to allocate all their funds to the highest expected value cause should also be relevant for individual donors
OP do not allocate all of their funding to the 'best' cause. Even if OP were a pure EV maximizer, they might have valid reasons not to do this, because they have such a big budget. It may be that diminishing marginal returns mean that the 'best' cause stops being the best once OP have given a certain level of funds to it, at which point they should switch to funding another cause instead.
But my impression is that this is not OP's reason for donating to multiple causes (or at least not their only reason). They are not purely trying to maximize expected value, or at least not in a naive first order way. One reason to diversify might be donor risk aversion, like you mention (e.g. you want to maximize EV while bounding the risk that you have no positive impact at all), and there are plenty of other considerations that might come into it too, e.g. sense of duty to a certain cause, reputation, belief in unquantifiable uncertainty and impossibility of making certain cause comparisons etc
But if these considerations are valid for OP then they should also be relevant for individual donors. For example, if an individual donor wants to bound the risk that they have no impact, then that might well mean not donating everything to the cause they think is most underfunded by OP. It would only make sense to do this if they had a weird type of risk aversion where they want to bound the risk that the EA community as a whole has no positive impact, but are unconcerned about their own donations' risk. This seems very arbitrary! Either they should care about the risk for their own donations, and should diversify, or they should be concerned with all of humanity's donations, in which case OP should not be diversifying either!
Pure EV maximizers don't care about percentages anyway
You could bite the bullet and say that neither OP nor individual donors should be diversifying their donations (except when faced with diminishing marginal utility). For these individual donors, they should be donating everything to one cause (and probably one charity unless they have a lot to give!) But even for these donors, it's not which causes OP underfund that really matters. It's what causes all of humanity underfund. So it is not the percentages of OP's funding allocation that matter, it's the absolute value.
If OP are a relatively small player in a cause area (global health..?) then their donation decisions are unlikely to be especially relevant to the individual donor. If they thought global health was the top cause before OP donations were taken into account, it probably still will be afterwards. But if OP are a relatively big player (animal welfare..?) then their donations are more relevant, due to diminishing marginal utility. But it is the absolute amount of funding they are moving, not the percentages, which will determine this.
I've been vegan for 11 years, and to me the growth felt faster in the first 5 years than it did in the second. This could easily just be due to my changing life circumstances (first 5 years as a student and living with other vegans), but that's my personal anecdotal evidence. Recently it also seems like all the vegan restaurants have been closing in my city (Manchester, UK) although hopefully(?) that is more to do with the economic situation than with a decline in veganism.
The link you've shared on the proportion of the population identifying as vegan is encouraging, but I'm finding it hard to figure out the data source for their graph. I'm sure I saw some data shared by someone on the EA forum recently that suggested the growth of veganism had been stagnating recently, but not sure how to find that now!
This seems like a really important question though and I'd love to read an in-depth analysis of what the answer is likely to be.
This is a fascinating analysis, but if I understand it correctly, you are estimating the impact of fishing and agriculture on average wild animal wellbeing (which you estimate by its effect on the death rate), not total wellbeing, as the first sentence of your post states. Is that correct?
This seems important, as I don't think there are many who would defend the idea that average welfare is what matters in population ethics? So I'm not sure how important the considerations you point out are. The change in population size seems like it's going to be the much more important effect here.
It also doesn't seem obvious to me that we should be able to estimate the impact of fishing or agriculture on average welfare purely by its impact on the death rate. Aren't there lots of other ways they could impact wild animal welfare too (e.g. by changing the cause of death for wild-animals)?
Same content on bluesky, for those avoiding twitter now: https://bsky.app/profile/wdmacaskill.bsky.social/post/3lcdlb4lbdk2m
I think this is a fascinating area, and the problems you've highlighted seem like important problems. I find it hard to believe it's a cause area EAs should focus on though.
As you explain, the clearest threat is the impact on cryptography, but it doesn't seem likely to me that that problem is neglected. There are huge incentives for governments and companies to solve that problem, and I think they are probably already doing lots of work on it..?
A question jumped out at me when reading these results. I should caveat this by emphasizing that I am very much not an expert in this kind of evaluation and this question may be naive.
Is there any seasonal effect on mortality in Malawi? If so, is it ok for the pre-intervention period to be 12-months while the post-intervention period is 18-months?
It might be fair to say that the o3 improvements are something fundamentally different to simple scaling, and that Chollet is still correct in his 'LLMs will not simply scale to AGI' prediction. I didn't mean in my comment to suggest he was wrong about that.
I could imagine someone criticizing him for exaggerating how far away we were from coming up with the necessary new ideas, given the o3 results, but I'm not so interested in the debate about exactly how right or wrong the predictions of this one person were.
The interesting thing for me is: whether he was wrong, or whether he was right but o3 does represent a fundamentally different kind of model, the upshot for how seriously we should take o3 seems the same! It feels like a pretty big deal!
He could have reacted to this news by criticizing the way that o3 achieved its results. He already said in the Dwarkesh Patel interview that someone beating ARC wouldn't necessarily imply progress towards general intelligence if the way they achieved it went against the spirit of the task. When I clicked the link in this post, I thought it likely I was about to read an argument along those lines. But that's not what I got. Instead he was acknowledging that this was important progress.
I'm by no means an expert, but timelines in the 2030s still seems pretty close to me! I'd have thought, based on arguments from people like Chollet, that we might be a bit further off than that (although only with the low confidence of a layperson trying to interpret the competing predictions of experts who seem to radically disagree with each other).
Given all the problems you mention, and the high costs still involved in running this on simple tasks, I agree it still seems many years away. But previously I'd have put a fairly significant probability on AGI not being possible this century (as well as assigning a significant probability to it happening very soon, basically ending up highly uncertain). But it feels like these results make the idea that AGI is still 100 years away seem much less plausible than it was before.