Computing on Andromeda: Chip Company Releases 13.5M Core Supercomputer
Known for its dinner plate-sized AI chips, Cerebras has now released an AI supercomputer available for commercial and academic research.
Artificial intelligence company Cerebras recently unveiled its Andromeda supercomputer, optimized specifically for deep learning applications. Andromeda is claimed to have more cores than 1,953 NVIDIA A100 GPUs and 1.6x more cores than the Frontier supercomputer.
The company is attempting to remove common challenges of general-purpose computing platforms, such as the extra overhead required to implement training over a distributed cluster of GPUs. Some distributed systems result in a 3D configuration that is complex for engineers to solve.
3D parallelism in distributed general purpose GPU.
Cerebras recently released its Andromeda supercomputer, comprising an AI application-specific hardware architecture that can provide linear-scaled training over several GPT language models; in simpler terms, training time decreases linearly with the number of computational cores involved. This is unlike general-purpose GPU clusters with a sub-linear relationship between computing units versus training time.
Cerebras claims that similar work is not possible on a cluster of 2,000 NVIDIA A100 GPUs due to memory and bandwidth limitations.
Key Specifications of the Andromeda Supercomputer
Andromeda is built on Cerebras' CS-2 systems and includes a total of 13.5 million AI-optimized compute cores and 18,178 third-gen AMD EPYC processors.
Andromeda is reported to provide near-linear scaling across the GPT language models. [Click to enlarge]
The supercomputer, which also employs wafer-scale clustering and weight streaming, is supported by Cerebras’ MemoryX and SwarmX technologies. Andromeda yields one exaflop of AI compute and 120 petaflops of dense compute with 16-bit half precision.
Andromeda hosts 16 CS-2 systems, each with a Cerebras’ Wafer-Scale Engine 2 (WSE-2) processor—a 46,225 mm2 processor with 2.6 trillion 7nm transistors. The WSE-2 is reported to be the “largest processor on Earth.” The supporting software platform integrates PyTorch and TensorFlow out of the box.
The Cerebras CS-2
Here are a few other specifications of the CS-2:
- 850,000 AI-optimized compute cores
- 40 GB integrated SRAM
- 20 PB/s memory bandwidth
- 220 PB/s interconnect bandwidth
- 1.2 TB/s I/O
- 12x 100 GB Ethernet links
- 15 rack units (RU)
- Water cooled
Wafer-scale clustering makes the most of the size of WSE-2 by fitting entire neural networks, comprising both the compute component and the parameter component, within a single processor. Wafer-scale clustering also takes advantage of data parallelism. Assigning the number of clusters is as straightforward and simple as setting a parameter.
With only a single keystroke, users can tap into distributing training across clusters of CS-2 systems.
This eliminates the need to plan and configure training a model over a distributed system, which can be complex, slow, and power-hungry. Additionally, since the AI computation is done on a single device, the training is faster.
Weight streaming is supported by Cerebras’ MemoryX and StreamX technologies. MemoryX manages the storage of model weights off-chip, including streaming weights back to the on-processor models, calculating updated weights, and timing the delivery. MemoryX can support 200 billion to 120 trillion parameters and claims to be as fast as if it were on-chip.
Weight streaming for a cluster of CS-2s.
SwarmX is another supporting technology that exists between MemoryX and the CS-2 system. It dispatches weights to the CS-2 systems and provides the resulting gradient back to MemoryX. These two technologies together achieve weight streaming during training.
Researchers Tap Into Andromeda's Supercomputing
Cerebras has given several R&D and academic institutions access to Andromeda for various applications.
The Argonne National Laboratory has used Andromeda to develop gene transformers, using the GPT3-XL model and the entire COVID-19 genome. Meanwhile, JasperAI is using Andromeda to train models that will be used to write copy for written materials such as advertisements, marketing, and books.
Andromeda is hosted in the Colovore data center in California, and Cerebras now provides access to more potential clients.
All images used courtesy of Cerebras