Elnur - stock.adobe.com
Microsoft ignites chip journey with AI accelerators
The company used its annual Ignite conference to showcase the work it is doing to optimise AI and make more energy-efficient hardware
As it ramps up its artificial intelligence (AI) strategy, Microsoft has developed a custom semiconductor, the Maia AI Accelerator, which it will use internally for its Copilot and Azure OpenAI service. There are also plans to make the technology available to Azure customers.
Among the main drivers for developing Maia is greater efficiency and optimisation when servers based on the new semiconductor are used in Microsoft Cloud datacentres.
According to Brian Harry, a Microsoft technical fellow leading the Azure Maia team, the Maia 100 AI Accelerator has been designed specifically for the Azure hardware stack. He claimed this vertical integration – which aligns chip design with the larger AI infrastructure designed for Microsoft’s workloads – can yield huge gains in performance and efficiency.
Sam Altman, CEO of OpenAI, said: “Since first partnering with Microsoft, we’ve collaborated to co-design Azure’s AI infrastructure at every layer for our models and unprecedented training needs. We were excited when Microsoft first shared their designs for the Maia chip, and we’ve worked together to refine and test it with our models. Azure’s end-to-end AI architecture, now optimised down to the silicon with Maia, paves the way for training more capable models and making those models cheaper for our customers.”
Along with Maia AI, the company also unveiled the Microsoft Azure Cobalt CPU, an Arm-based processor tailored to run general purpose compute workloads on the Microsoft Cloud along with expanding industry partnerships to provide more infrastructure options for customers. These partnerships include a new NC H100 v5 Virtual Machine Series built for Nvidia H100 Tensor Core GPUs and the addition of Nvidia H200 Tensor Core GPU to its fleet next year to support larger model inferencing with no increase in latency.
Microsoft has also added AMD-powered AI hardware to its portfolio in ND MI300 virtual machines, which it said are designed to accelerate the processing of AI workloads for high-range AI model training and generative inferencing. ND MI300 uses AMD’s latest GPU, the AMD Instinct MI300X.
Scott Guthrie, executive vice-president of Microsoft’s Cloud + AI Group, said: “Microsoft is building the infrastructure to support AI innovation, and we are reimagining every aspect of our datacentres to meet the needs of our customers. At the scale we operate, it’s important for us to optimise and integrate every layer of the infrastructure stack to maximise performance, diversify our supply chain and give customers infrastructure choice.”
Read more about AI hardware efficiency
- Researchers in Belgium are working on a new generation of platforms to support artificial intelligence applications in an energy-efficient manner.
- Nvidia and Intel go from competitors to partners by jointly developing a system designed to push heavy AI workloads and offer users significant energy saving.
There is growing evidence that the computational power required to run AI workloads and, in particular, large language models, consumes huge amounts of energy.
In February, research firm SemiAnalysis suggested OpenAI required 3,617 of Nvidia’s HGX A100 servers, with a total of 28,936 graphics processing units, to support ChatGPT, implying an energy demand of 564 MWh per day. The research estimated that ChatGPT costs $694,444 per day to run.
A recent New York Times article quoted a researcher who predicts the electricity needed to run AI could increase the world’s carbon emissions significantly, but this would depend on whether the datacentres are powered by fossil fuels or renewable energy resources.
Wes McCullough, corporate vice-president of hardware product development, said choosing Arm technology was a key element in Microsoft’s sustainability goal.
“The architecture and implementation is designed with power efficiency in mind,” he said. “We’re making the most efficient use of the transistors on the silicon. Multiply those efficiency gains in servers across all our datacentres, it adds up to a pretty big number.”