Note: this post is a transcript of a talk I gave at EA Global: Bay Area 2023.
These days, a lot of effective altruists are working on trying to make sure AI goes well. But I often worry that, as a community, we don’t yet have a clear picture of what we’re really working on.
The key problem is that predicting the future is very difficult, and in general, if you don’t know what the future will look like, it’s usually hard to be sure that any intervention we do now will turn out to be highly valuable in hindsight.
When EAs imagine the future of AI, I think a lot of us tend to have something like the following picture in our heads.
At some point, maybe 5, 15, 30 years from now, some AI lab somewhere is going to build AGI. This AGI is going to be very powerful in a lot of ways. And we’re either going to succeed in aligning it, and then the future will turn out to be bright and wonderful, or we’ll fail, and the AGI will make humanity go extinct, and it’s not yet clear which of these two outcomes will happen yet.
Alright, so that’s an oversimplified picture. There’s lots of disagreement in our community about specific details in this story. For example, we sometimes talk about whether there will be one AGI or several. Or about whether there will be a fast takeoff or a slow takeoff.
But even if you’re confident about some of these details, I think there are plausibly some huge open questions about the future of AI that perhaps no one understands very well.
Take the question of what AGI will look like once it’s developed.
If you asked an informed observer in 2013 what AGI will look like in the future, I think it’s somewhat likely they’d guess it’ll be an agent that we’ll program directly to search through a tree of possible future actions, and select the one that maximizes expected utility, except using some very clever heuristics that allows it to do this in the real world.
In 2018, if you asked EAs what AGI would look like, a decent number of people would have told you that it will be created using some very clever deep reinforcement learning in a really complex and diverse environment.
And these days in 2023, if you ask EAs what they expect AGI to look like, a fairly high fraction of people will say that it will look like a large language model: something like ChatGPT but scaled up dramatically, trained on more than one modality, and using a much better architecture.
That’s just my impression of how people’s views have changed over time. Maybe I’m completely wrong about this. But the rough sense I’ve gotten while in this community is that people will often cling to a model of what future AI will be like, which frequently changes over time. And at any particular time, people will often be quite overconfident in their exact picture of AGI.
In fact, I think the state of affairs is even worse than how I’ve described it so far. I’m not even sure if this particular question about AGI is coherent. The term “AGI” makes it sound like there will be some natural class of computer programs called “general AIs” that are sharply distinguished from this other class of programs called “narrow AIs”, and at some point – in fact, on a particular date – we will create the “first” AGI. I’m not really sure that story makes much sense.
The question of what future AI will look like is a huge question, and getting it wrong could make the difference between a successful research program, and one that never went anywhere. And yet, it seems to me that, as of 2023, we still don’t have very strong reasons to think that the way we think about future AI will end up being right on many of the basic details.
In general I think that uncertainty about the future of AI is a big problem, but we can also do a lot to reduce our uncertainty.
If you don’t fully understand a risk, a simple alternative to working on mitigating the risk directly is just to try to understand the phenomenon better. In the case of AI, we could try to rigorously forecast AI, similar to how climate scientists try to build a model of the climate to better understand and forecast it.
That’s essentially the agenda I’m working on at Epoch, a new organization that's trying to map out the future of AI. We work on foundational questions like: what will AI look like in the future, what effects will it have on the world, and when should we expect some of the most important AI-related things to happen?
What I want to do in this talk is outline three threads of research that we’re currently interested in that we think are tractable to work on, and could end up being critical for piecing together the story of how AI will unfold in the future.
Software vs. hardware progress
The first thread of research I want to talk about is the relative importance of hardware versus software as a driver of AI progress.
This problem has been explored before. In the modern context of deep learning, probably the most salient analysis is from Danny Hernandez and Tom Brown at OpenAI in 2020. In their research, they re-implemented 15 open source machine learning models, adjusted some of the hyperparameters, and found a 44-fold decrease in the amount of compute required to reach the same performance as AlexNet on the ImageNet dataset since 2012.
Last year Epoch researchers re-evaluated progress on ImageNet using a different methodology employing scaling laws, and came to an even more extreme conclusion, showing that the amount of compute required to reach a certain level of performance halved roughly every nine months.
But even with these analyses, our basic picture of software progress in AI is unfortunately very incomplete. A big issue is that it’s not clear how much software progress on ImageNet transfers usefully to the relevant type of tasks that we care about.
For example, it’s easy to imagine that some ways of achieving better performance on ImageNet, like inventing the Adam optimizer, are general insights that speed up the entire field of AI. Whereas other things we can do, like pre-processing the ImageNet dataset, pretty much only provide improvement on ImageNet itself.
Plausibly, what matters most for forecasting AI is the rate of general software progress, which as far as I can tell, is currently unknown.
Here’s another big issue with the existing research that tries to measure software progress that I think is even more important than the last problem. Compare two simple models of how software progress happens in AI.
In the first model of software progress, humans come up with better algorithms more-or-less from the armchair. That is, we leverage our intelligence to find clever algorithms that allow us to train AIs more efficiently, and the more intelligent we are, the better our algorithmic insights will tend to be.
In the second model of software progress, the way we make software progress is mostly by stumbling around in the dark. At various points in time, researchers experiment with different ways of training AIs with relatively little cleverness. Sometimes, these insights work, but most of the time, they don’t. Crucially, the more experiments we perform, the more likely we are to get lucky and stumble upon an algorithm for training AI that’s more efficient than anything that came before.
These stories are different in a very important way. In the first story, what allowed us to make software progress was by having a lot of intelligence. If this story is true, then as AI gets more advanced, we might expect software progress to accelerate, since we can leverage the intelligence of AI to produce more insights into algorithms, which makes training even more advanced AIs easier, which we can leverage to produce the next generation of AIs, and so on, in a feedback loop. I believe Tom Davidson refers to this scenario as a software singularity.
However, in the second story, hardware and labor are ultimately what enabled us to make software progress. By having more hardware available, researchers were able to try out more experiments, which enabled them to discover which algorithmic insights work and which ones don’t. If this second story is true, then plausibly access to compute, rather than raw intelligence, is a more important driver of AI progress. And if that’s true, then it’s not totally clear whether there will be a rapid acceleration in software progress after we begin to leverage smart AI to help us develop even smarter AI.
These are massively different pictures of what the future might look like, and yet personally, I’m not sure which of these stories is more plausible.
And the question of whether AI progress is driven by hardware or software is highly important to policy. If we cleared up confusion about the relative significance of hardware and software in AI, it would provide us information about what levers are most important to influence if we want to intervene on AI progress.
For example, if hardware is the main driver of AI progress, then access to hardware is the key lever we should consider intervening on if we want to slow down AI.
If, on the other hand, software dominates, then our picture looks quite different. To intervene on AI progress, we would probably want to look at where the smartest research scientists are working, and how we can influence them.
Ubiquity of transfer learning
Moving on, the second thread of research I want to talk about is about the ubiquity of transfer learning, and the relevance transfer learning will play in building future AI systems.
Transfer learning refers to the degree to which learning one task meaningfully carries over to a different task. For example, the degree to which learning psychology transfers to one’s understanding of economics.
In the last several years, we’ve witnessed an interesting trend in the field of machine learning. Rather than trying to get a model to learn a task by training it from scratch on a bunch of data collected for that task individually, it’s now common to take a large transformer, called a foundation model, pre-train it on a very large, diverse corpus, and then fine-tune it on whatever task you’re interested in.
The fundamental reason for the recent trend towards foundation models is that we found a way to leverage transfer learning successfully from very cheap sources of data. Pre-training on a large dataset, like the internet, allows the model to efficiently learn downstream tasks that we care about, which is beneficial when we don’t have much data on the downstream task that we’re targeting. This increase in data efficiency is ultimately determined by the amount of transfer between the pre-training data and the fine-tuning task.
Here’s one of the biggest questions that I’m interested in within this topic: to what extent will transfer learning alleviate important data bottlenecks in the future?
To make this question more concrete, consider the task of building general-purpose robots – the type that would allow us to automate a very large fraction of physical labor.
Right now, the field of robotics is progressing fairly slowly, at least relative to other areas of AI like natural language processing. My understanding is that a key reason for this relatively slow progress is that we’re bottlenecked on high quality robotic data.
Researchers try to mitigate this bottleneck by running robotic models in simulation, but these results have not been very successful so far because of the large transfer gap between simulation and reality.
But perhaps in the future, this transfer gap will narrow. Maybe, just like for language models, we can pre-train a model on internet data, and leverage the vast amount of knowledge about the physical world encoded in internet videos. Or maybe our robotic simulations will get way better. In that case, robotics might be less data constrained than we might have otherwise thought. And as a result of pre-training, we might start to see very impressive robotics relatively soon, alongside the impressive results we’ve already seen in natural language processing and image processing.
On the other hand, perhaps pre-training on internet data doesn’t transfer well at all to learning robotics, and our robotic simulations won’t get much better in the future either. As a consequence, it might take a really long time before we see general purpose robots.
In that case, we might soon be in a world with very smart computers, but without any impressive robots. Put another way, we’d witness world class mathematician AIs before we got to see robotics that works reliably at the level of a 3 or 4 year old human.
Since it's plausible that dramatically increasing the growth rate of the economy requires general-purpose robotics, this alternative vision of the future implies that superintelligent AI could arrive many years or even decades before the singularity begins. In other words, there could be a lot of time between the time we get very cognitively advanced AI, and the time that things actually start speeding up at a rate way above the historical norm.
Knowing which of these versions of the future ends up happening is enormously useful for understanding the ways in which future AI might be dangerous, and how we should try to intervene. If AI will become superintelligent but unable to act in the physical world for a long time, that would mean we’re facing a very different profile of risks compared to a situation in which AI soon outcompetes humans along every relevant axis more-or-less simultaneously.
I also think this question is very tractable to work on. Recall that the key factor that determines how easily we can alleviate bottlenecks in this framework is the degree to which training on a cheap pre-training distribution transfers knowledge to another distribution in which collecting data is expensive. As far as I can tell, this is something we can measure today for various distributions, using current tools.
Takeoff speeds
I’ll move now to the third thread of research. The research thread is about takeoff speeds, and in particular whether we’ll have a gradual takeoff with the economy slowly ramping up as AI gets more capable, or whether advanced AI will arrive suddenly without much warning, which will have the consequence of a single very intelligent system taking over the world.
Unlike the first two threads, my remarks about this topic will be relatively brief. That’s because Tom Davidson is going to speak after me about the new model he just published that sheds light on this question. Instead of trying to explain his model before he does, I’ll try to give some context as to why this subject is important to study.
In short, knowing how fast takeoff will be helps us to understand which AI-safety problems will be solved by default, and which ones won’t be. A plausible rule of thumb is, the slower the takeoff, the more that will be solved by default. And if we don’t know what problems will be solved by default, we risk wasting a lot of time researching questions that humanity would have solved anyway in our absence.
In a post from Buck Shlegeris published last year, Buck argues this point more persuasively than I probably will here. Here’s my summary of his argument.
If a fast takeoff happens, then the EA x-risk community has a very disproportionate effect on whether the AI ends up aligned or not. Broadly speaking, if there’s a fast takeoff, our community needs to buckle down and solve alignment in time, or else we’re doomed.
But if a slow takeoff happens, the situation we’re in is vastly different. In a slow takeoff scenario, broader society will anticipate AI and it will be part of a huge industry. The number of people working on alignment will explode way beyond just this one community, as the anticipated consequences and risks of AI development will eventually be widely recognized.
That means that, in the slow takeoff case, we should try to focus on – not necessarily the most salient or even the hardest parts of AI safety – but on the parts that can’t be done by the far more numerous researchers who will eventually work in parallel on this problem later.
Concluding remarks
I want to take a step back for a second and focus again on the broader picture of forecasting AI.
I often get the sense that people think the only interesting question in AI forecasting is just pinning down the date when AGI arrives. If that’s your view, then it could easily seem like AI forecasting work is kind of pointless. Like, so what if AGI arrives in 2035 versus 2038? I actually totally agree with this intuition.
But I don’t really see the main goal of AI forecasting as merely narrowing down a few minor details that we’re currently not certain about. I think the type of work I just talked about is more about re-evaluating our basic assumptions about what to expect in the future. It’s more about clarifying our thinking, so that we can piece together the basic picture of how AI will turn out.
And I don’t think the three threads I outlined are anywhere near the only interesting questions to work on.
With that in mind, I encourage people to consider working on foundational AI forecasting questions as an alternative to more direct safety work. I think high-quality work in this space is pretty neglected. And that’s kind of unfortunate, because arguably we have many tools available to make good progress on some of these questions. So, perhaps give some thought to working with us on our goal to map out the future of AI.
This post was an excellent read, and I think you should publish it on LessWrong too.
I have the intuition that, at the moment, getting an answer to "how fast is AI takeoff going to be?" has the most strategic leverage and that this topic influences the probability we're going extinct due to AI the most, together with timelines (although it seems to me that we're less uncertain about timelines than takeoff speeds). I also think that a big part of why the other AI forecasting questions are important is because they inform takeoff speeds (and timelines). Do you agree with these intuitions?
Relatedly: If you had to rank AI-forecasting questions according to their strategic importance and influence on P(doom), what would those rankings look like?
I kind of object to the title of this post. It's not really AI forecasting you want, insofar as forecasting is understood as generating fairly precise numerical estimates through some finding a reference class, establishing a base rate, and applying a beautiful sparkle of intuition. You're making the case for AI informed speculation, which is a completely different thing altogether. The climate analogy you make is pretty dubious because we have a huge historical sample of past climates and at least some understanding of the things that drove climate change historically, so we can build some reasonably predictive climate models. This is not the case for AGI and I doubt we actually can reduce our uncertainty much.
You're right that "forecasting" might not be the right word. Informed speculation might be more accurate, but that might confuse people, since there's already plenty of work people call "AI forecasting" that looks similar to what I'm talking about.
I also think that there are a lot of ways in which AI forecasting can be done in the sense you described, by "generating fairly precise numerical estimates through some finding a reference class, establishing a base rate, and applying a beautiful sparkle of intuition". For example, if you look at Epoch's website, you can find work that follows that methodology, e.g. here.
I also agree that climate change researchers have much more access to historical data and, in some ways, the problem they're working on is easier than the problem I'm trying to work on. I still think that AI forecasting and climate forecasting are conceptually similar, however. And in fact, to the extent that AI plays a large role in shaping the future of life on Earth, climate forecasts should probably take AI into account. So, these problems are interrelated.
Yes, I think using the term "forecasting" for what you do is established usage - it's effectively a technical term. Calling it "informed speculation about AI" in the title would not be helpful, in my view.
Great post, btw.
I don’t think this is a well specified intuition. It would probably be really valuable if people could forecast the ability to build/deploy AGI to within roughly 1 year, as it could inform many people’s career planning and policy analysis (e.g., when to clamp down on export controls). In this regard, an error/uncertainty of 3 years could potentially have a huge impact.
However, I think a better explanation for the intuition I (and possibly you) have is tractability—that such precision is not practical at the moment, and/or would have such diminishing marginal returns so as to make other work more valuable.
Yeah, being able to have such forecasting precision would be amazing. It's too bad it's unrealistic (what forecasting process would enable such magic?). It would mean we could see exactly when it's coming and make extremely tailored plans that could be super high-leverage.
Re "consider working on foundational AI forecasting questions as an alternative", what opportunities are available for people who are interested?
I don't personally have a strong sense as to what opportunities are available. I think Epoch and AI Impacts are both great organizations in this space. However, I think this type of work can plausibly also come from anywhere that thinks rigorously about the future of AI.
Thanks for sharing this - I'm sad I missed this talk, but really appreciate being able to quickly download your takes so soon after the event!
(I skimmed this so sorry if I just missed it, but…) I think you should also discuss the potential downside risks of AI forecasting (e.g. risks related to drawing attention to some ML approaches, that currently get very little attention, when listing the approaches that are most likely to enable a fast takeoff).