The landscape of AI will be built on hardware, not just models
I had a very interesting conversation this week with some Chinese investors and the topic came up about whether it makes sense for chipmakers to design chipsets specifically for large language models (LLMs) or AI-based systems. The answer is definitely yes. In fact, I am sure hardware - and in particular chip design - is the critical next development in the landscape of AI.
We're all familiar with the crucial role Graphics Processing Units (GPUs) have played in the rise of AI, mainly due to their ability to handle calculations with massive concurrency. But there are limits to the role of GPUS because there is a limit to how much a problem can be split up into smaller parts that can be computed simultaneously. In addition, handling the communication between all these concurrent processes can be a challenge.
While GPUs are incredibly versatile, they may not be the most efficient solution for all AI computations, especially those involving extensive matrix operations. In essence, these operations involve performing mathematical functions on arrays of numbers. These operations are prevalent in neural networks, where we multiply weights (stored in matrices) with inputs (also in matrices) to propagate information through the network. An AI-specific chip can be designed to optimize these operations, thereby reducing computational time and power consumption.
Beyond GPUs
This is where AI-specific chips or accelerators, which are designed to optimize these kinds of computations, can come into play. Their performance efficiency and reduced power consumption could change the rules of the game.
In the context of LLMs, there's potential for even more specific optimizations. AI chips for LLMs could be designed to handle the large amounts of memory and high bandwidth these models require. Moreover, they could feature data pathways optimized for the types of sequences LLMs work with - that is, strings of text that come one after another. This is fundamentally different from many other types of data (like images), where all the data can be processed at once. To handle sequences, LLMs often use architectures like recurrent neural networks (RNNs) or Transformers, which process one part of the sequence at a time and use the result to influence the processing of the next part. A chip optimized for this sort of processing could have specialized data pathways or circuitry for these architectures, which could make the processing more efficient.
We also need to bear in mind the difference between training and inference. (You do always bear that in mind when talking about AI, don't you? Good.) While training AI models requires robust computational power, there's likely a need for chips optimized for inference, to allow for real-time interaction with AI models.
As of now, industry leaders like NVIDIA, Google, and several semiconductor startups are researching AI-specific chips.
Microsoft has been developing an internal artificial intelligence chip, codenamed Athena, since 2019 for training software such as LLMs.
Nvidia, the market leader in AI training chips, has been facing competition from Google's TPUs and Amazon's Trainium. These chips are seen as the current rival chips for developing LLMs, and they're only available over their respective clouds​. NVIDIA's GPUs include specialized hardware units known as Tensor cores, designed to accelerate operations for deep learning and AI. They perform mixed-precision matrix multiply and accumulate calculations concurrently, which are fundamental to many deep learning models. By executing these operations in specialized hardware, tensor cores can significantly speed up training and inference times, improving both performance and power efficiency. However, their benefits are most pronounced in workloads that involve a large amount of matrix multiplication and accumulation operations, like those found in deep learning. But, because they operate on low-precision inputs, they can introduce numerical precision issues that need to be managed.
In general, data center operators are increasingly matching specific AI processors with specific workload needs to maximize performance. For instance, AWS chose Intel’s Habana chips for machine learning models on vision and its own Trainium chip for language because its own chip design is better suited for that purpose​. We will likely see this kind of specialization become much more common in the future.
The future landscape of AI will be shaped by the arrival of AI-specific chips. The future of AI, it seems, is not just about evolving algorithms, but also about the transformative power of optimized hardware.