CTO interview: Europe benefits from energy gains in AMD chips

AMD’s chief technology officer explains how the latest chip technology can help European organisations solve the energy puzzle facing IT departments

This article can also be found in the Premium Editorial Download: Computer Weekly: The politics of AI

When it comes to computing power, Europe is facing what at first glance seems to be a paradox. Performance must continue to increase, while power consumption must decrease. 

“Over the next decade, energy efficiency is going to rise to top priority,” says Mark Papermaster, chief technology officer at AMD.

“It doesn’t mean the performance of the computing systems is any less important. We have to improve the performance at least at the same pace as the old Moore’s law, which means doubling it roughly every two years.”

In fact, the demand for computing power is increasing at a much higher rate than ever before. Artificial intelligence (AI) algorithms, for example, will require 10 times more computing power every year for the foreseeable future.

According to Papermaster, energy consumption will be the limiting factor as the rate of performance increases in future generations. AMD is rising to the challenge with what he calls a “holistic” design approach. 

Transistor density still matters. It may not be on a par with Moore’s law, but chip manufacturers will continue to cram more transistors into each new generation of semiconductors.

More transistors will provide more functionality and performance. Similarly, frequency – how fast the transistors run – will no longer increase as much as it has in the past. And finally, the price of transistors is going up. All of these aspects of Moore’s law are gone. However, transistors themselves will improve with each new generation. AMD is taking it even further by marrying their improved transistors with new design techniques on how computation is carried out. 

Read more about AMD

AMD also plans to innovate around how it packages accelerators with central processing units (CPUs). It has already done so with graphics processing units (GPUs). AMD has GPUs for both gaming and content distribution. It also has GPUs specifically designed for the datacentre – to accelerate AI training and inference.

On 4 January 2023, AMD announced it had started packaging AI inference accelerators into the Radeon 7000. It also packages acceleration with CPUs for embedded devices.

“Across the entire product line, you need to think about accelerators that you can build in heterogeneously – and to do that, you have to invest in new ways of integrating these accelerators in with the processor,” says Papermaster. “There have been advances in packaging technology and we have been building partnerships to benefit from those advances.”

AMD has also increased the bandwidth between chiplets, resulting in performance gains and lower energy consumption. Furthermore, integrating CPUs and GPUs virtually eliminates costly data transfer energy. 

Finally, AMD is partnering with application developers, using information about how applications work to design and package semiconductors to meet specific needs. Transaction processing, for example, has different needs to AI. And even in AI, there is a big difference in the processing that trains models and the processing that runs the resulting trained models anywhere from deep in the cloud to the smallest device on the edge. 

“The applications a customer runs affect the kind of solution they put together,” says Papermaster. “How you use computing power on-premise or on-cloud makes a difference. At AMD, we are adding acceleration across our portfolio and enabling our customers to apply the right computing solution based on the customer need. And in the semiconductor industry, we’re going to leverage holistic design to still maintain that historic exponential pace of advanced computing capabilities, generation after generation.”

In September 2021, AMD announced its “30 by 25” initiative, based on heterogeneous computing and holistic design. It committed to a 30 times improvement in energy efficiency in accelerated datacentre computing by the year 2025 compared with 2020.  

A 30-times improvement means that by 2025, the power required for an AMD accelerated compute node to complete a single calculation will be 97% lower than in 2020. Last year, Mark Papermaster wrote in a blog post that the company was on track to meet that goal. 

Addressing needs specific to Europe

Europe is ahead of the rest of the world in understanding the need for energy-efficient computing. Part of the reason is that the cost of power is much higher in Europe, but another reason is the high level of awareness around sustainability.  

An example of what AMD is doing in Europe is illustrated by their partnership with LUMI, the supercomputer centre in Finland. The heterogeneous supercomputer was built on AMD MI250X GPUs. 

Finnish researchers have already developed a large Finnish language model using LUMI. This language model is based on 13 billion parameters. Now, AMD is working with LUMI and Allen Institute to develop a full general-purpose 70 billion-parameter large language model by the end of the year. 

“When you run ChatGPT, it might take up to 10 seconds for an answer,” says Papermaster. “That typically runs on about eight GPUs and one CPU. So, currently, over the course of a day, when you think about 10 million ChatGPT queries a day, it uses about as much power as you would need would to power over 5,000 homes. It’s tremendous power consumption and we’re just starting.

“Part of the holistic design is figuring out how to be more efficient,” he says. “How do you reduce the model size, rather than having billions of parameters? If I have a specific task that I’m trying to do for an AI application, can I shrink the size of that model and still have sufficient accuracy for the task I have at hand?

“We’re already seeing that innovation,” says Papermaster. “It’s out there. People are innovating in all kinds of ways about how to make this more efficient, more targeted, so we don’t hit the power barrier. And you can do more specialised algorithms, and it could be more efficient.

“I think it’s an innovation-rich environment,” he continues. “It’s just going to spur new thinking on everything from silicon all the way up through the application space – and on the model size and optimisations we make going forward. This is going to accelerate the holistic design, as I call it.”

Read more on Chips and processor hardware