Startup Enfabrica’s Accelerated Compute Fabric Addresses AI/ML in the Cloud

The networking silicon and software company emerged from stealth mode today at the inaugural MemCon, where it is introducing new Accelerated Compute Fabric (ACF) devices with the goal of addressing interface bottlenecks and scalability challenges in AI computing.

News March 28, 2023 by Dale Wilson

Unless you have been living under a rock for the past six months, you have witnessed the massive growth in AI applications—ChatGPT, Dall·E, Bard, and Bing, just to name a few. ChatGPT alone is estimated to have reached 100 million monthly users only two months after launch.

The Network I/O Bottleneck

This explosion in AI traffic creates bottlenecks in the networks and distributed computing infrastructure. When it started back in 2020, Enfabrica believed that scaling the performance and capacity of modern high-performance distributed computing was limited by I/O. Memory capacity was growing rapidly, and GPU performance was increasing exponentially, but I/O was not keeping pace, as illustrated in the figure below.

Network I/O performance is not keeping pace with GPU computing performance

Network I/O performance is not keeping pace with GPU computing performance. Image used courtesy of Enfabrica

Enfabrica CEO Rochan Sankar explained to All About Circuits that the high-level challenge with AI is that “it is pumping so much data in and out of the server nodes through a 100 or 200 Gig NIC; a tiny element that was designed originally for pairing with a CPU.” He went on to elaborate on three problems associated with this fundamental issue:

A significant stranding of resources. While CPUs are well virtualized, other expensive resources like GPUs and memories are underutilized.
The existing stack of I/O devices is inefficient because it was built for a different set of needs.
Other companies creating product solutions are using “more proprietary or siloed” methods, as opposed to industry standards like Ethernet, PCI, and CXL.

Addressing the Growing Network I/O Problem

Enfabrica co-founders, Rochan Sankar and Shrijeet Mukherjee, have many decades of network infrastructure experience between them working for industry stalwarts like Broadcom, Google, and Cisco. As Sankar explained to All About Circuits, they were not the only people that recognized the looming I/O problem, but they believed they had a better way to approach it. Armed with those ideas, they built their team and got to work on disrupting the interconnect silicon market that has been estimated to soon be worth $20 billion.

At the heart of its design, Enfabrica is looking to replace multiple tiers of network infrastructure with its accelerated compute fabric (see the figure below). Sankar explained that the Enfabrica architecture “acts as a hub and spoke model” that can “disaggregate and scale any arbitrary compute resource.” He went on to explain, “Whether that's a CPU, GPU, accelerator, memory, or Flash, they can all connect to this hub (that) acts effectively as an aggregated I/O fabric device for them.”

Accelerated compute fabric device aims to collapse multiple network tiers for improved performance

Accelerated compute fabric device aims to collapse multiple network tiers for improved performance. Image used courtesy of Enfabrica

In addition to the challenge of introducing a new hardware architecture into these systems, Sankar pointed out that you cannot change the software layer. “It takes a whole lot of effort to make that work to begin with. So introducing hardware technologies or networking technologies that force that to change is actually quite problematic.” Enfabrica aims for its hardware to operate “with the same interfaces and the same API set that exists today.”

Industry Standards and Open Source

While other companies, including industry giant Nvidia, are tackling this networking problem with proprietary interface solutions, Enfabrica uses industry standards like PCIe and CXL in conjunction with open-source software frameworks.

Sankar quickly pointed out that they are “providing an alternate way to scale.” He believes that “Nvidia is going to be at the heart of this ecosystem for quite some time.” So, they are not expecting to replace Nvidia, but augment existing solutions. “We can add a tier of high capacity memory” that customers can “leverage to scale very large language models.”

ACF First Silicon

The first generation of the advanced compute fabric switch (ACF-S), illustrated in the following figure, is being fabricated at TSMC on its 5 nm FinFET process that was developed, in part, for high-performance computing applications like this.

Enfabrica first-generation multi-Tpbs server fabric silicon IC architecture

Enfabrica first-generation multi-Tbps server fabric silicon IC architecture. Image used courtesy of Enfabrica

The ACF-S is designed to deliver multi-Terabit switching and bridging between heterogeneous compute and memory resources in a single silicon die, without changing the physical interfaces, protocols or software layers above device drivers. Sankar explained that the switching chip "sandwich" layers "high-performance Ethernet switching pipelines, a large shared buffer, what we call a terabit NIC copy engine, and high-performance PCIe Gen5 and CXL 2.0+ switching."

"These accelerated computing fabric products are designed to create elastic pools of resources that can be networked and provisioned on demand, to create much more flexible instances. And that is a huge factor in being able to scale to meet the demands of next-gen workloads and to do it in a way that's sustainable in terms of the total cost of ownership."

Supercomputer Performance With Cloud Economics

If your goal is to disrupt a $20 billion industry, you had better come armed with some major improvements. In our discussion, Sankar highlighted a number of the benefits they foresee with the advanced computing fabric:

Scale AI clusters by 10X from hundreds of nodes to thousands of nodes
2X improvement in I/O bandwidth per $
Relieve GPU, DRAM, and SSD stranding to improve utilization
Up to 75% lower node-to-node latency
Up to 50% reduction in AI cluster total cost of ownership
Reduce the power consumption of GPU racks and clusters by at least 10%

If they are successful, Sankar believes that Enfabrica can help “bridge the world between high-performance supercomputing and cloud-scale distributing” with AI as the central workload driving these requirements. “People want cloud economics, but they want supercomputer performance.”

Enfabrica is planning to have product-specific announcements later this year. In the meantime, if you are at MemCon today, you can check out the Enfabrica presentation on Breaking Data Movement Chokepoints in Distributed Computing by Shrijeet Mukherjee, co-founder and Chief Development Officer, at 4:25 PM.