Chinese Startup Biren Technology Takes on NVIDIA in GPU Market
At Hot Chips 34, a new player entered the high-performance GPU market.
For several years, the GPU market has primarily been dominated by two key players: AMD and NVIDIA. Other companies like Intel, however, have been catching up. Last week, an entirely new competitor entered the GPU market at Hot Chips 34.
The BR100 accommodates 77 billion transistors, Biren says. Image used ourtesy of Biren Technology
Chinese startup Biren Technology emerged from stealth mode at the conference, announcing the release of its BR100 GPU. In this article, we’ll take a look at Biren, its first GPU offering, and how it stacks up against other top GPUs.
The GPU Design of Biren's BR100
Built on TSMC's 7nm process, the BR100 GPU integrates several modern architectural layouts, indicating that the GPU may be a formidable opponent for AMD and NVIDIA’s offerings. Similar to many of the industry’s newest flagships, such as the Apple M1 Ultra and the NVIDIA Grace CPU, the BR100 employs a dual-die design. The BR100 consists of two individual chiplets, each of which hosts 16 streaming processor clusters (SPCs) and computing blocks that are optimized for streaming applications.
Within the SPC, the 16 execution units (EUs) per SPC are separated into four groups of four EUs. At an even lower level, each execution unit is made up of 16 streaming processor cores, which Biren calls V-Cores, as well as one Tensor Engine, called a T-Core. Together, each EU cluster shares 64 kB of L1 cache, while each SPC shares 8 MB of L2 cache.
Diagram of the BR100 architecture. Image used courtesy of Biren Technology
The two dies are then connected to one another through a proprietary die-to-die interconnect, which clocks in at a bandwidth of 96 GB/s. Altogether, the BR100 occupies an area of 1074 mm2 and features over 77 billion transistors. Biren has designed its BR100 GPU to offer massive amounts of parallelism to optimize artificial intelligence training and inference at the data center.
BR100 Performance
According to Biren, the BR100 can keep pace with some of its bigger competitors in AI inference and training, reaching peak performances of 2048 TOPS at INT8, 1024 TFLOPS at BF16, 512 TFLOPS at TF32+, and 256 TFLOPS at FP32. To reach these metrics at a nominal 1 GHz clock frequency, the BR100 has a maximum thermal design power (TDP) of 550 W.
Computing performance comparison of the Biren BR100 and other flagship GPUs. Image used courtesy of Biren Technology
In benchmarking reports, Biren compares its BR100 to NVIDIA’s Ampere A100 GPU—with Biren claiming a 2.6x speedup advantage during certain AI inference and training. Benchmarking, however, can be a nebulous pursuit, especially since Biren doesn’t compare to NVIDIA’s newest GPU offerings in the Hopper family. While the BR100 benchmarks claim a win over A100, Biren's GPU falls short of Hopper, which significantly outperforms the A100.
Expanding China's GPU Influence
The BR100 is a notable entrant to the GPU market, which has long been dominated by only a few companies. Should its performance live up to Biren's benchmarking claims, the BR100 may be China’s most impressive GPU offering to date, signaling a broadening of the market and an expansion of geopolitical competition in the semiconductor industry.
This doesn’t make any sense. Built on TSMC’s 7nm process? Like TSMC is going to manufacture this for China when China is threatening to invade their country, Taiwan?
And, making a chip is 1/3 of the battle. The hard part is writing robust drivers like NVidia has.