TL;DR
This newsletter is about the semiconductor industry and AI development, and is the second on a three-part analysis of this topic. This part looks at the type of computer chips required for modern AI development and the role Nvidia plays in developing these chips.
Here are the key takeaways:
The development and deployment of modern AI systems requires sufficiently powerful chips that can cope with the size and complexity of these systems.
Graphic processing units are the most suitable chips for this task and many cloud services used for AI research rely on data centres equipped with GPU-powered servers.
The advantage of GPUs is their ability to break down large processing tasks into smaller parts and perform them simultaneously, making them suitable not only for rendering computer graphics but also building deep learning models.
Nvidia is the leading provider of GPUs for AI research, and its hardware powers many of the popular AI models and systems of today, including OpenAI's GPT models.
Why AI needs powerful chips
There are three main drivers of AI development: algorithms, data and computing power. Regarding computing power in particular, modern deep learning models require a lot of this due to their size, complexity and the amount the training data used.
Deep neural networks feature a multi-layered architecture consisting of several neurons each with a set of weights. These weights are essentially values that specify how to process the input through the layers of the network.
When training these models, the goal is to identify a set of values for the weights that correctly map the given input to the desired outputs. The most advanced models contain millions, or even billions, of these weights, which is what makes the internal workings of these models very complex.
This architecture is then applied to large training datasets to enable the model to find the optimal values for the weights. All this data is processed by the model through each of its layers and figures out the weights needed to match each input with the correct output.
Accordingly, the size of the training datasets need to be large enough for the model to identify the optimal set of weights. Once this has been achieved, the model, theoretically, should perform well enough to be used in a real-world environment.
Both the complexity and scale of AI models and their training data are certainly applicable to large language models (LLMs). Such models contain several layers of neural networks and an extensive training process that uses huge amounts of text data.
LLMs like ChatGPT therefore require a huge amount of computing power, which can get really expensive. This is something that even Sam Altman, CEO of OpenAI, has admitted:
The cost arises from the fact that developing and running deep learning models requires a lot of energy. Back in February 2023,
predicted that running ChatGPT daily cost over $600,000.In another post exploring the model architecture for GPT-4,
noted how inference (i.e., users running queries on the model) is one of the biggest issues for companies like OpenAI:The real battle is that scaling out these models to users and agents costs far too much. The costs of inference exceed that of training by multiple folds. This is what OpenAI’s innovation targets regarding model architecture and infrastructure.
GPUs and AI
Currently, the chips used to power the development of large models like ChatGPT and other advanced AI systems are the graphic processing units (GPUs).
The utility of these chips for developing AI models was first discovered in 2008 by Geoffrey Hinton, a pioneer of deep neural networks, and Li Deng, a researcher at Microsoft at the time. During his work on speech recognition tools, Hinton was experimenting with neural networks and both him and Deng recognised that sufficiently powerful processors were needed for this activity.
This is when they turned to GPUs:
Silicon Valley chip makers like Nvidia originally designed these chips as a way of quickly rendering graphics for popular video games like Halo and Grand Theft Auto, but somewhere along the way, deep learning researchers realized GPUs were equally adept at running the math that underpinned neural networks. In 2005, three engineers had tinkered with the idea...and a team at Stanford University stumbled onto the same technical trick around the same time. These chips allowed neural nets to learn from more data in less time...The difference was that GPUs were off-the-shelf hardware. Researchers didn't have to build new chips to accelerate the progress of deep learning. Thanks to games like Grand Theft Auto and gaming consoles like the Xbox, they could use chips that were already there for the taking. In Toronto, Hinton...trained [his] speech recognition system using these specialized chips, and this is what pushed it beyond the state of the art.1
Before then, Intel's central processing units (CPUs) were relied on for ML development. But with the rise of more complex AI systems, CPUs quickly became inadequate for the needs of such systems.
The biggest drawback of CPUs in the context of AI development is that they only do their calculations serially (i.e., one after another). So while these chips can be used for AI development, they do not cope well with the scale of computation required to train AI models, especially neural networks, and therefore slow down development.
What is needed for AI are chips that can run the necessary calculations faster whilst taking up less space and using less power than CPUs. This is where GPUs come in:
These chips are designed to run multiple iterations of the same calculations at the same time; this is called 'parallel processing'.
This is not only helpful for rendering images in video games but also for training AI models efficiently.
Thus, GPUs are much better equipped to serve the AI demand for chips as opposed to Intel's CPUs.
With this parallel processing capability, GPUs are very useful for carrying out the processing required for developing deep learning models. These chips take the large processing task of these models, break them down into smaller parts, and then perform each part simultaneously. As a result, developers can train and run inferences (i.e., queries) on models using neural network architectures much faster on GPUs.
Nvidia's entrance into AI
As mentioned in Talking chips (Part 1), Nvidia originally started building GPUs to serve the 3D computer graphics market. But with a growing demand for GPUs among AI developers, the company eventually began making products to serve this lucrative market:
Building a neural network was a task of trial and error, and with tends of thousands of GPU chips at their disposal, researchers could explore more possibilities in less time. The same phenomenon quickly galvanized other companies. Spurred by the $130 million in graphics chips sold to Google, Nvidia reorganized itself around the deep learning idea, and soon it was not merely selling chips for AI research, it was doing its own research, exploring the boundaries of image recognition and self-driving cars, hope to expand the market even further.2
Accordingly, in the mid 2000s, Nvidia adapted its GPUs to perform the workloads required for AI development. This enabled the company to sell its chips to cloud companies wanting to equip their data centres with GPU-based servers.
Eventually, in 2022, data centre sales became Nvidia's biggest business as it rode the AI wave. Its revenue in this area has been growing dramatically in the last fews years due to current AI hype, primarily driven by generative AI (genAI) models.
Today, Nvidia's GPUs power the most advanced data centres used to develop and deploy modern AI systems. A model like GPT-4, for example, would have likely been trained on tens of thousands of GPUs provisioned via Azure under a key partnership with Microsoft.
Furthermore, if the development of GPT-5 involves increasing the number of parameters in the model, then more computing resources will inevitably be required to fuel this work, as Altman confirmed in an interview with the Financial Times in November 2023:
To train its models, OpenAI, like most other large AI companies, uses Nvidia's advanced H100 chips, which became Silicon Valley's hottest commodity over the past year as rival tech companies raced to secure crucial semiconductors needed to build AI systems.
Altman said there had been "a brutal crunch" all year due to supply shortages of Nvidia's $40,000-a-piece chips. He said his company had received H100s, and was expecting more soon, adding that "next year looks already like it's going to be better."
Nvidia's role in the current AI hype cycle
During this latest genAI hype, Nvidia has played a key role by helping to resolve the compute problem and enable developers like OpenAI to build and deploy their models:
The Nvidia GPU architecture is inherently well-tailored to running dense matrix models, which has driven transformers to scale in capabilities much faster than any other model architecture. With increased capabilities also came an explosion in popularity. Scale means almost everything in AI, and so the billion-dollar question is if these models will be able to continue to scale with existing dense architectures for years to come.
Nvidia provides hardware for both the training and inference of large genAI models. Its latest product for this is the H200, a data centre GPU which started shipping this year.
According to Nvidia, the H200s are equipped with larger and faster memory that "fuels the acceleration of generative AI and large language models (LLMs) while advancing scientific computing for [high-performance computing] workloads." Inferences on GPT-3, which has 175 billion parameters, run 1.6 times faster compared to previous iterations of the company's GPUs.
These chips powered OpenAI's demo of GPT-4o back in May, demonstrating the company's ambitions to remain the leading provider of compute for AI and the systems built with it:
Additionally, Nvidia offers server rack solutions tailored for AI development and deployment. Its GB200 NVL72 racks combine multiple GPUs together that function as one big, more energy-efficient GPU.
Sales of these data centre GPUs make up over 86% of Nvidia's revenue, which totals to $26 billion. This is shown in the graphic below:
Several important partnerships contribute to these data centre sales. For example, both Google and Microsoft have expanded their partnerships with Nvidia that involves the adoption of its GPU hardware to improve cloud services for developing and managing AI applications.
Cloud service providers are Nvidia's biggest customers, which contribute over half of its data centre GPU revenues. Other data centre customers include consumer internet companies and other enterprises wanting to develop their own AI models.
Cade Metz, Genius Makers: The Mavericks Who Brought AI to Google, Facebook and the World (Penguin Random House 2021), p.72.
Cade Metz, Genius Makers: The Mavericks Who Brought AI to Google, Facebook and the World (Penguin Random House 2021), p.140.