News

NVIDIA and Microsoft Join Forces on Massive Cloud AI Computer

November 28, 2022 by Darshil Patel

Last week, Microsoft and NVIDIA announced a partnership to develop the most powerful and massively scalable AI virtual machine.

In 2019, Microsoft partnered with OpenAI to build a cloud-based AI supercomputer that runs on the Azure cloud. It was one of the top five publicly disclosed supercomputers in the world. In the same year, Microsoft shared its vision for "AI at Scale" and started working on a new class of technologies that deal with over 100 billion parameters.

 

https://blogs.microsoft.com/ai-for-business/ai-at-scale-explained/

Microsoft's "AI at Scale" approach. Screenshot courtesy of Microsoft

 

To support a more ambitious AI workload, Microsoft has now partnered with NVIDIA as a part of a multi-year collaboration to build a new supercomputer running on Azure's supercomputing infrastructure and powered by NVIDIA GPUs (graphics processing units), networking platforms, and AI software. The new supercomputer will help accelerate advances in generative AI and allow enterprises to train and deploy large-scale models.

Meanwhile, Microsoft will leverage NVIDIA's powerful transformer engine, NVIDIA H100, to optimize its transformer-based large language model library, DeepSpeed, reducing its energy consumption and memory usage during AI training. Both companies will also make a stack of AI workflows and software development kits, which will be available to Azure Enterprise customers.

 

The New Cloud AI Computer

The new cloud-based AI computer features scalable Azure ND- and NC- series virtual machines optimized for AI workloads. On the hardware level, it includes thousands of NVIDIA Quantum InfiniBand platforms and the NVIDIA AI Enterprise software suite.

 

NVIDIA H100 Tensor Core GPU

NVIDIA H100 Tensor Core GPU. Image used courtesy of NVIDIA

 

Microsoft Azure's AI virtual machines now integrate NVIDIA's advanced GPUs and networking platforms so Azure customers can deploy thousands of GPUs in a single cluster to train large-scale modeling. The companies also claim that their services allow for generative AI at scale.

The current Azure virtual machines feature NVIDIA Quantum 200 Gb/s networking platforms with NVIDIA A100 GPUs. It will be upgraded with Quantum-2 400 Gb/s InfiniBand networking and NVIDIA H100 GPUs. The H100 GPUs boast fourth-generation tensor cores, and the Transformer Engine with FP8 precision delivers up to nine times faster training than the A100 GPUs.

 

AI at Scale and Generative AI

AI at Scale is Microsoft's new approach to parallel computing to rapidly train large-scale machine learning models. Large, centralized AI models are advantageous because they only need training once using supercomputing with massive datasets. They can also be fine-tuned for different tasks and domains. Such models can be scaled and modified across various products.

With its 17-billion parameter Turing-Natural Language Generation (T-NLG) model, Microsoft demonstrated the ability to detect nuances in data. However, training such models require clusters of thousands of AI-accelerated machines interconnected by a high bandwidth network across the complete data center. Microsoft mentions that such clusters in Azure enable new natural language generation and understanding across Microsoft products. 

 

T-NLG

T-NLG outperforms other models on many downstream NLP tasks. Image used courtesy of Microsoft [Click to enlarge]

 

Now, with NVIDIA's H100 Transformer Engine and Quantum InfiniBand networking platforms, both companies aim to accelerate Azure's DeepSpeed language model. The new hardware will enable 8-bit floating point precision to double the throughput of 16-bit operations in DeepSpeed. The NVIDIA AI Enterprise software suite also plays an important role in generative AI and other cybersecurity frameworks by streamlining the AI workload.

The Azure virtual machine series, ND A100 v4, and clusters are now in preview and will be a standard offering allowing the configuration of clusters of any size—from eight to thousands of interconnected NVIDIA GPUs across hundreds of virtual machines.