Understanding the Compute Hardware Behind Generative AI
Generative AI tools like ChatGPT have had a huge impact in numerous sectors of society. As engineers, it’s helpful for us to understand the computing technology that makes it possible.
Artificial intelligence has made significant leaps forward in recent years as new technologies emerge at an unprecedented rate. There’s no doubt that tools like ChatGPT, Bard, and Einstein will impact industries across the board—from media and content creation to research, finance and beyond.
These tools can now closely simulate human conversation, with the ability to comprehend contextual information, converse in real-time, and execute tasks from translating to summarizing with remarkable precision.
Figure 1. ChatGPT is just the beginning of where AI-based tools are headed. Image from Adobe Stock
To give a sense of the speed at which these AI technologies have advanced, “the amount of compute used on the largest AI training runs has been increasing exponentially with a 3.4-month double time,” according to OpenAI. Looking back to the 80s and 90s, an era when neural networks saw intense research activity, compute and memory for training engines were incredibly weak compared to today’s capabilities.
Flash forward to 2012, and the relentless progression of Moore’s Law enabled AI to efficiently perform classification—identifying objects in pictures and video. At the same time, the development of natural language processing (NLP) was vitally important.
Figure 2. Shown here is the total amount of compute, in petaflop/s-days, used to train selected results in the period from 2012 and 2018. Image used courtesy of OpenAI. (Click on image to enlarge)
As tools like Siri began to take off, slowly developing and evolving with these applications, the obvious next step was getting them to generate. Once these training algorithms could quickly classify information, make correlations between data, and understand requests made, the question became whether they could utilize those learnings to efficiently assemble new content in a recognizable way.
Thanks to the semiconductor industry’s advancements in HBM (high-bandwidth memory), DDR, heterogeneous computing, and more, that has become reality as we enter the beginning of a new era of generative AI.
How Hardware Got Us Here
The training and application of generative AI is complex, using advanced learning models with massive data processing needs. Advancements in computing, especially main memory, were indispensable as the generative AI applications we’re familiar with today reached the fledgling stages.
The last decade in particular saw large advances in AI training and inference capabilities due to generational upgrades in DDR DIMM chipsets and HBM interfaces, as well as domain-specific computing architectures. These all played pivotal roles in generative AI development, helping to improve speed, capacity, and connectivity to satisfy increasingly demanding workloads.
The machine learning algorithms used to create new images, audio and text through generative AI require vast amounts of data and fast memory to run effectively. DDR5, the latest standard for DDR memory, offers higher data transmission rates and lower power consumption, which facilitates much more efficient data processing at low latencies compared to prior generations.
Figure 3. Block diagram of a Rambus high-bandwidth memory (HBM) 3.0 memory subsystems. Image used courtesy of Rambus
Simultaneously, Rambus High-Bandwidth Memory (HBM) has empowered AI accelerators by improving compute speeds and energy efficiency thanks to multiple generations that have significantly increased memory bandwidth and capacity.
The Move to Heterogeneous Computing
Servers themselves also started moving to a heterogeneous computing architecture, as purpose-built accelerators are increasingly used to offload specialized workloads from CPUs. An example heterogeneous computing system consists of a CPU, an AI accelerator, and a network processor.
These can each perform different types of computations to enable generative AI since complex computations can be performed faster on specialized processing units. A CPU, for example, will be utilized for general-purpose processing tasks and can offload certain tasks to specialized processors like the AI accelerator.
The AI accelerator can accelerate tensor operations, improving the speed of neural network training and inference. And the network processor can improve the speed at which data is moved across the network to the CPU and AI accelerator in the server.
By harnessing the strengths of each processing unit, generative AI can provide high-quality data with greater efficiency. Memory cache coherency, facilitated by processors and new standards like CXL, also plays a critical role in this aspect, as it powers the sharing of memory resources between CPUs and accelerators.
Speeding Up AI Training
These processors drastically speed up the AI training and inference, reducing total cost of ownership and allowing for greater scalability. Altogether, this has allowed researchers to move beyond classification into content generation itself. The cumulative effects of these improvements contributing to how quickly these models learn can easily be seen today.
Just within the last few months, we’ve seen ChatGPT improve by leaps and bounds. What was once a new technology to experiment with in November of 2022 has now become capable enough to pass the bar exam in the top 10%, just a few months later, according to an ABA Journal article.
While current capabilities demonstrate the impressive potential of AI, they merely scratch the surface of what could be achieved in the future. As we begin to consider how these technologies could revolutionize the way we communicate and do business, a new question emerges: What is the next possible stage for generative AI?
Semiconductors and The Next Stage of ChatGPT
ChatGPT and similar tools are evolving so fast that more advanced capabilities will move into the mainstream very quickly. AI will move beyond text and voice input and has the potential to include new capabilities like interpreting emotion and nuance in the new future. This would be a game changer for customer service, entertainment, gaming, and more industries, but how do we get there?
Figure 4. Artificial intelligence, in the form of generative AI-based tools like ChatGPT and others, rely on continuing advances from the semiconductor industry. Image used courtesy of Rambus
The rapid evolution of hardware has shown that the industry reaches the limits of existing hardware very quickly. In order to continue moving AI forward, the hardware powering generative AI must have even more advanced computing power coupled with higher-bandwidth memory interconnects, and storage. This will require fast-paced innovation from the semiconductor industry, and a commitment and coordinated efforts to address bottlenecks between memory and processing.
An Industry That’s All In
As new demands arise, it will be challenging to continue advancing at the current pace. Thankfully, the industry is “all in” on its push to improve memory, which has always played an important role in enabling developing computing paradigms.
The amount of compute used on the largest AI training runs would not have grown by 300,000× in the past 11 years if it weren’t for the semiconductor industry’s ability to produce faster chips and interconnects.
New technologies on the horizon demonstrate the industry’s investment in advancing memory technologies and exploring new architectures to continue to improve AI.
Industry Articles are a form of content that allows industry partners to share useful news, messages, and technology with All About Circuits readers in a way editorial content is not well suited to. All Industry Articles are subject to strict editorial guidelines with the intention of offering readers useful news, technical expertise, or stories. The viewpoints and opinions expressed in Industry Articles are those of the partner and not necessarily those of All About Circuits or its writers.