Strengthening Its Silicon Foundation, AWS Releases Two New AI Processors
AWS has unveiled two AI chips to train models faster, cheaper, and more efficiently than predecessors.
At the ongoing AWS re:Invent conference in Las Vegas, AWS introduced the next generation of its Trainium and Graviton chips targeted at high-performance AI workloads. According to AWS, Graviton4 offers significantly better performance, more cores, and more memory bandwidth than Graviton3. Graviton is a family of AI chips built for cloud workloads.
Graviton4 and Trainium2. Image used courtesy of AWS
The second chip, Trainium2, is a high-performance chip targeted at large-scale deployment in Amazon Elastic Compute Cloud (EC2) “UltraClusters” of 100,000 individual chips. These EC2 Ultraclusters are designed to meet the scalable demands of computing power using the cloud.
Graviton4: A Leap in Performance From Graviton 3E
According to the official AWS technical guide on GitHub, previous chips in the lineup also aimed at high efficiency and performance. The Graviton3E, for instance, is powered by scalable Arm Neoverse V1 CPUs optimized for cloud-native workloads. The Neoverse V1 uses scalable vector extensions (SVE), enabling the Graviton3E to adapt to different workloads. SVE, in contrast to traditional single instruction, multiple data (SIMD) architectures, allows the processor to adapt to different vector lengths at runtime instead of compile time.
Vectors can be thought of as a collection of elements that are processed in parallel. An example of a vector processing instruction in a traditional SIMD architecture is the Intel x86 instruction _mm256_add_ps instruction. This instruction is for fixed-sized, 256-bit vectors. Alternately, with SVE, the size of the vectors used in a computation is dynamically determined at runtime. For workloads that require smaller computations, smaller vectors can be used to increase energy efficiency. It’s not surprising, then, that AWS touted Graviton 3E for its 35% higher vector processing performance.
AWS created Graviton4 to take the performance and scalability improvements of Graviton3E further. Graviton4 is powered by the Arm Neoverse V2 CPU, which Arm says can double the performance of the Neoverse V1 used in the Graviton3E.
Architecture of the Arm Neoverse V2 CPU. Image used courtesy of Arm
Graviton4 also has enhanced security capabilities. The underlying Neoverse V2 CPU uses ArmV9, which is inherently more secure than its predecessors because of its confidential computing architecture. In addition to having a larger 2 MB of L2 cache, the Graviton4 also implements branch target identification (BTI), another feature of the underlying Arm CPU architecture. This prevents the execution of unwanted instructions due to indirect branches, enhancing code security. AWS says Graviton4 is 40% faster for databases and 30% faster for web applications while still emphasizing security and scalability.
Trainium2 Speeds Up Training More Efficiently
One of the most important aspects of artificial intelligence or machine learning technology is training, the process of "teaching" the AI using a set of data. AWS Trainium specifically targets high-performance training compute infrastructure via the cloud.
A Trainium AI accelerator uses the AWS NeuronCore Architecture, with each accelerator having 32 GB of in-bandwidth memory and delivering up to 190 TFLOPs of computing power. NeuronCore has separate engines for tensor (multidimensional array) computation, vector processing, and scalar processing.
Key features of the AWS NeuronCore Architecture. Image used courtesy of AWS
AWS says Trainium2 can train foundational models (FMs) and large language models (LLMs) four times faster than previously possible by being deployed in EC2 UltraClusters. AWS is also allowing access to other coveted AI chips, such as Nvidia GPUs. Some Nvidia chips, such as the GH200 Superchips, will be accessible through the EC2 service.
AWS Expands Hardware Options
AWS aims to extend flexible computer hardware options to users—whether AWS silicon such as Trainium2 or Intel, Arm, and Nvidia chips. Generative AI technology, which is used to generate content as opposed to labeling and classifying it, relies heavily on large language models (LLMs) and foundational models (FMs) to generate text and other content.
AWS is not without competitors. Microsoft Azure has developed the Azure Maia AI accelerator platform and the Azure Cobalt CPU—also targeted at generative AI applications. These could represent direct competition to AWS AI silicon in the months ahead.