Hi, thanks for writing this up. I agree the macro trends of hardware, software, and algorithms are unlikely to hold true indefinitely. That said, I mostly disagree with this line of thinking. More precisely I find it unconvincing because there just isn’t a lot of empirical evidence for or against these macro trends (e.g. natural limits to the growth of knowledge), so I don’t really understand how you can use it to rule out certain endpoints as possibilities. And when I see an industry exec make a statement about Moore’s Law I generally assume it is only to reassure investors that the company is on the right path this quarter rather than making a profound forward-looking statement about the future of computing. For example since that 2015 quote, Intel lost the mobile market, fell far behind on GPUs, and is presently losing the datacenter market.
There are a number of well-funded AI hardware startups right now, and a lot of money and potential improvements on hardware roadmaps including but not limited to: exotic materials, 3D stacking, high-bandwidth interconnects, new memory architectures, and dataflow architecture. On the AI side techniques like distillation and dropout seem to be effective at allowing much smaller models to perform nearly as well. Altogether I don’t know if this will be enough to keep Moore’s law (and whatever you’d call the superlinear trend of AI models) going for another few decades but I don’t think I’d bet against it, either.
Machine learning involves repetitive operations which can be processed simultaneously (parallelization)
I agree, but of course Amdahl's Law remains in effect.
The goal of hardware optimization is often parallization (sic)
Generally when designing hardware increased throughput or reduced latency (for some representative set of workloads) are the main goals. Parallelization is one particular technique that can help achieve those goals, but there are many ideas/techniques/optimizations that one can apply.
The widespread development of machine learning hardware started in mid-early 2010s and a significant advance in investment and progress occurred in the late 2010s
Sure... I mean deep learning wasn't even a thing until 2012. I think the important concept here is that hardware designs have a long time horizon (generally 2-3 years) because it takes that long to do a clean-sheet design and also because if you're spending millions of dollars to design/tapeout/manufacture a new chip, you need to be convinced that the workload is real and people will still be using it years from now when you're trying to sell your new chip.
CUDA optimization, or optimization of low-level instruction sets for machine learning operations (kernels), generated significant improvements but has exhausted its low-hanging fruit
Like the other commenter, this could be true but I'm not sure what the argument is for this. And again, it depends on the workload. My recollection is that even early versions of cuDNN (circa 2015) were good enough that you got >90% of the max floating point performance on at least some of the CNN workloads common at that time (of course transformers weren't invented yet).
The development of specialized hardware and instruction sets for certain kernels leads to fracturing and incentivizes incremental development, since newer kernels will be unoptimized and consequently slower
This could be true, I suppose. But I'm doubtful because those hardware designs are being produced by companies that have studied the workloads and are convinced they can do better. If anything competition may incentivize all hardware manufacturers to spend more time optimizing kernel performance than they otherwise would.
intermediate programs (interpreters, compilers, assemblers) are used to translate human programming languages into increasingly repetitive and specific languages until they become hardware-readable machine code. This translation is typically done through strict, unambiguous rules, which is good from an organizational and cleanliness perspective, but often results in code which consumes orders of magnitude more low-level instructions (and consequently, time) than if they were hand-translated by a human. This problem is amplified when those compilers do not understand that they are optimizing for machine learning: compilation protocols optimized to render graphics, or worse for CPUs, are far slower.
This is at best an imperfect description of how compilers work. I'm not sure what you mean by "repetitive", but yeah, the purpose is to translate high-level languages to machine code. However:
I am curious if the FTX stake in Anthropic is now valuable enough to plausibly bail out FTX? Or at least put a dent in the amount owed to customers who were scammed?
I've lost track of the gap between assets and liabilities at FTX, but this is a $4B investment for a minority stake, according to news reports. Which implies Anthropic has a post-money valuation of at least $8B. Anthropic was worth $4.6B in June according to this article. So the $500M stake reportedly held by FTX
shouldmight be worth around double whatever it was worth in June, and possibly quite a bit more.Edit: this article suggests the FTX asset/liability gap was about $2B as of June. So the rise in valuation of the Anthropic stake is certainly a decent fraction of that, though I'd be surprised if it's now valuable enough to cover the entire gap.
Edit 2: the math is not quite as simple as I made it seem above, and I've struck out the word "should" to reflect that. Anyway, I think the question is still the size of the minority share that Amazon bought (which has not been made public AFAICT) as that should determine Anthropic's market cap.