As models are pushed into every computer-mediated online interaction, training costs will likely be dwarfed by inference costs. NVidia's market cap may therefore be misleading in terms of the potential magnitude of investment in inference infrastructure, as NVidia is not as well positioned for inference as it is currently for training. Furthermore, cloud-based AI inference requires low-latency network data centre (DC) access. Such availability will likely be severely curtailed by the electrical power density that is physically available for the scaling of AI inference. i.e. cheap electricity for AI compute is near nuclear and hydro power, and typically not near major conurbations, and suitable for AI training, but not for inference.
How would you factor in exponential growth specifically in AI inference? Do you think this will occur within the DC or in edge computing?
I suspect AI inference will be pushed to migrate to smartphones due to both latency requirements and significant data privacy concerns. If this migration is inevitable, it will likely drive a huge amount of innovation in low-power ASIC neural compute.
As models are pushed into every computer-mediated online interaction, training costs will likely be dwarfed by inference costs. NVidia's market cap may therefore be misleading in terms of the potential magnitude of investment in inference infrastructure, as NVidia is not as well positioned for inference as it is currently for training. Furthermore, cloud-based AI inference requires low-latency network data centre (DC) access. Such availability will likely be severely curtailed by the electrical power density that is physically available for the scaling of AI inference. i.e. cheap electricity for AI compute is near nuclear and hydro power, and typically not near major conurbations, and suitable for AI training, but not for inference.
How would you factor in exponential growth specifically in AI inference? Do you think this will occur within the DC or in edge computing?
I suspect AI inference will be pushed to migrate to smartphones due to both latency requirements and significant data privacy concerns. If this migration is inevitable, it will likely drive a huge amount of innovation in low-power ASIC neural compute.