Petya Petrova - Fotolia
Prepare to deploy custom hardware to speed up AI
Latest forecasts suggest spending on artificial intelligence is ramping up, and organisations that need raw machine learning performance are turning to custom hardware
Spending in artificial intelligence (AI) across Europe is set to grow by 49% in 2019 compared with 2018, as organisations begin using the technology to gain a competitive advantage, according to IDC.
Andrea Minonne, senior research analyst at IDC Customer Insight & Analysis in Europe, said: “Many European retailers, such as Sephora, Asos and Zara, as well as banks such as NatWest and HSBC, are already experiencing the benefits of AI – including increased store visits, higher revenues, reduced costs, and more pleasant and personalised customer journeys.
“Industry-specific use cases related to automation of processes are becoming mainstream and the focus is set to shift toward next-generation use of AI for personalisation or predictive purposes.”
There is industry consensus that a traditional CPU-based computer architecture is generally not up to the task of running machine learning algorithms. Today, graphics processors offer the performance needed to run current machine learning applications.
But the web giants that require even greater levels of performance are now developing custom AI acceleration hardware. For instance, in February the FT reported that Facebook was developing its own chip for machine learning.
Facebook joins Google, which announced its custom AI chip three years ago. In 2016, Google unveiled a tensor processing unit (TPU), a custom application specific integrated circuit (Asic) it had built specifically for machine learning – and tailored for the TensorFlow deep neural network (DNN) learning module.
At the time, Norm Jouppi, distinguished hardware engineer at Google, wrote: “We have been running TPUs inside our datacentres for more than a year, and have found them to deliver an order of magnitude better-optimised performance per watt for machine learning. This is roughly equivalent to fast-forwarding technology about seven years into the future [three generations of Moore’s Law].”
Google’s TPU is available on GCP. The top-end v2-512 Cloud TPU v2 Pod is currently being tested and costs $422.40 per pod slice per hour.
Asics are exceedingly expensive and limited because they are designed to run one application, such as the TensorFlow DNN module in the case of Google’s TPU. Microsoft Azure offers acceleration using a field programmable gate array (FPGA), and according to Microsoft, FPGAs provide performance close to Asics.
“They are also flexible and reconfigurable over time, to implement new logic,” it said. Its hardware accelerated machine learning architecture, dubbed Brainwave, is based on Intel’s FPGA devices to achieve what Microsoft said “enables data scientists and developers to accelerate real-time AI calculation”.
Acceleration with GPUs
Arguably, graphics processing units (GPUs) are the entry point for most organisations looking to deploy hardware to accelerate machine learning algorithms. According to Nvidia, GPUs fit well with the need to train deep neural networks for AI applications.
“Because neural networks are created from large numbers of identical neurons, they are highly parallel by nature,” it said. “This parallelism maps naturally to GPUs, which provide a significant speed-up over CPU-only training.”
Jos Martin, senior engineering manager and principal architect for parallel computing tools at MathWorks, said: “Without the advent of GPUs and the fast computation that they bring, we would not be seeing the current explosion in this area. AI developments and GPU computing go hand in hand to accelerate each other’s growth.”
Among the advances in GPU technology over the last few years, they now support what is known in computer science as “mixed precision algorithms”, said Martin.
GPUs for machine learning are easily accessible from the cloud, such as Amazon EC2 P3, which offers up to eight Nvidia V100 tensor core GPUs and up to 100 Gbps of networking throughput for $31.22 per hour.
Read more about AI acceleration
- Google’s Edge TPU is a force multiplier to compete against the likes of Amazon, IBM and Microsoft, and to attract next-gen app developers.
- OpenStack supporting community will help enterprises to overcome infrastructure barriers to adopting AI technologies in 2019 as demand for GPU and FPGA-based set-ups grows.
Clearly, data needs to be in the cloud for machine learning processing. Where regulations or the size of the dataset prohibit this, a number of organisations have built their own GPU-based machine learning accelerators.
One example is Tinkoff bank in Moscow, which has built its own supercomputer to support its strategy to develop a platform for machine learning and AI. Called the Kolmogorov cluster, it is believed to be the eighth-largest supercomputer in Russia.
The hardware, comprising 10 nodes with Nvidia Tesla V100 accelerators powered by tensor cores, provides up to 658.5TFlops of peak double-precision floating-point (FP64) performance.
The bank said the AI-acceleration hardware took just 24 hours to retrain a sales probability forecasting model using its entire 13-year set of accumulated data. It estimated that a traditional computing approach would have required six months to run the same sales forecasting probability model.
Quantum computing could have a role to play in the future of machine learning acceleration. As Computer Weekly has previously reported, the Massachusetts Institute of Technology (MIT) and Oxford University, along with researchers from IBM’s Q division, have published a paper detailing an experiment to show how quantum computing could accelerate the technique of feature mapping to identify unique attributes in data such as recognising someone’s face.
The researchers are looking to identify which datasets would represent a good fit for quantum-based AI acceleration.