Chaosamran_Studio - stock.adobe.
Why run AI on-premise?
Much of artificial intelligence’s rapid growth has come from cloud-based tools. But there are very good reasons to host AI workloads on-premise
Artificial intelligence (AI) services such as OpenAI’s ChatGPT, Microsoft’s Copilot and Google’s Gemini run in the cloud. But when it comes to enterprise deployments, AI is by no means cloud-only.
Advances in technology, the development of “small language” and open source models, performance, and the benefits of locating AI close to data sources create situations favouring on-premise architecture. Then there are considerations around security, data privacy, safeguarding intellectual property and cost. For each, there is a strong case for on-premise AI, even as most of the industry’s attention is on cloud solutions.
“Most enterprises currently run their AI workloads in the cloud, driven by the significant advantages of scalability, cost efficiency and rapid deployment that cloud platforms like AWS [Amazon Web Services], Microsoft Azure and Google Cloud offer,” says Derreck Van Gelderen, head of AI strategy at PA Consulting.
“These cloud providers have developed comprehensive ecosystems that allow companies to bypass large upfront infrastructure costs and instead access flexible resources which are ideal for handling the high computational demands of AI – and now generative AI [GenAI] – models, especially during the resource-intensive training phases,” he adds.
John Gasparini, cloud strategy and technology lead at KPMG, sees similar trends. “Certainly, the majority of clients I’m working with are using cloud-based AI services to test out some of these early use cases,” he says. “They are taking advantage of some of the large language or foundation models that are out there, or building their own models on top of these cloud services.”
Building in-house AI capabilities, he suggests, requires “significant capital investment”, but the return on investment (ROI) from AI is not yet guaranteed.
Cloud infrastructure allows organisations to build AI systems quickly, but also to scale down projects that fail to work out. The cloud also gives ready access to sophisticated models, including the latest-generation large language models (LLMs). Many of the leading GenAI models are, for now at least, only available in the cloud.
But the cloud does have its limitations. And some of these limitations become more of a burden as enterprises expand their use of AI – either in terms of the breadth of tasks it carries out, or by linking it to more sensitive data.
Limitations of AI in the cloud
The limitations of cloud-based AI largely mirror the drawbacks of cloud computing for other enterprise applications: data sovereignty, security, growing regulations and cost.
“The advantage of public cloud is if you can test ideas. If they don’t work, you can turn it off, and you’ve not got large write-off costs to deal with at that point,” says KPMG’s Gasparini.
John Gasparini, KPMG
But as AI projects grow, so do bills. “I’ve certainly had conversations with clients recently who are starting to look at how to get visibility of AI costs,” he adds.
As with any other cloud application, firms need to understand how to predict and manage those costs. And for AI, costs can increase with larger data volumes for training and with more users making more queries through AI tools.
“The cloud can scale and at a short-term cost point that works really well,” suggests Grant Caley, UK and Ireland solutions director at technology supplier NetApp. “But as soon as you leave stuff there, [including] the data itself, you’ve got to pay for that. It becomes a cost argument quite quickly.”
In addition, if a business uses vector databases for AI projects – and most do – industry figures suggest they could need 10 times as much space as they did for the original data. This quickly increases costs. Data sovereignty, privacy and security are also reasons for moving from the cloud to on-premise AI.
“Some of the primary challenges that organisations wrestle with are data privacy and sovereignty,” cautions PA Consulting’s Van Gelderen. “This is especially critical in sectors such as defence, nuclear, healthcare and other highly regulated organisations that need robust control over data.”
Performance, too, can be an issue. “Latency is another concern, particularly for applications requiring real-time or low-latency responses, such as autonomous systems or edge-based solutions,” he says. “Delays introduced by transmitting data to and from cloud servers can be a limiting factor.”
Moving AI in-house
The cloud’s limitations are prompting at least some enterprises to run AI in-house, or to look to on-premise options as their AI operations grow. And this links to the type of AI enterprises run, the location of data sources, and the differing needs of AI’s training and inference phases.
Grant Caley, NetApp
“Nowadays, when most people refer to AI, 90% of the time they are thinking about GenAI technologies,” says PA Consulting’s Van Gelderen. “Generative AI and LLMs, however, are only a part of the broader AI landscape and have distinct infrastructure needs compared to ‘traditional’ AI – for example, machine learning classification and regression models, and other subsets like natural language processing and computer vision.”
This suggests organisations will need more than one approach to technology for AI. In addition, the growing importance of retrieval-augmented generation (RAG) adds another layer of complexity. Enterprises are using RAG to add their own business context to AI model output. This can give rise to results that are more sensitive, or need more security, than the raw results from a large language model.
“It feels like RAG has become non-negotiable for enterprises using generative AI in their own environments,” says Pure Storage’s field chief technology officer for EMEA, Patrick Smith.
“Firstly, it overcomes most, if not all, of the challenges with hallucinations. Secondly, it gives you the ability to use your own data with generative AI without having to do any tuning. Thirdly, it allows you to overcome the tough time problem of not being able to use current data unless retraining [the model]. So, the currency of insights is addressed as well,” he adds.
But this affects the infrastructure needed to run AI. According to Smith, it impacts performance and “data gravity”. The best place to locate data, he suggests, is driven less by the large language model than by vector databases.
“It’s defining where the overall solution sits, which is then influencing people to pull AI solutions from the public cloud back into their own datacentre,” he says. “As soon as you’ve gone down a vector database and RAG approach, then you want the model next to your vector database.”
Nor do enterprises always need the latest, cloud-based, generative AI models. There is growing interest in open source LLMs, such as Meta’s Llama.
Models are emerging that can run on less powerful hardware, from companies such as France’s Mistral, as well as sector-specific models.
And researchers are also working on small language models. These could be better suited to handling the most sensitive data, and easier to run in-house. Eventually, these models could run on an industry-standard server, or even a powerful laptop. But these options are quite a different proposition to running current-generation LLMs in house, especially during the training and tuning phases.
Running AI in-house – practical considerations
Enterprises looking to run AI workloads in-house need to weigh up the technical requirements and upfront costs for IT infrastructure against ongoing and potentially rising costs for the cloud.
“Running AI workloads on-premise presents several challenges, including high hardware costs, power and cooling requirements, and ongoing maintenance demands. Building an infrastructure capable of supporting large-scale GenAI models is capital-intensive,” warns PA Consulting’s Van Gelderen. “In the training phase, where large datasets and immense processing power are needed, cloud environments often provide a clear advantage.”
Patrick Smith, Pure Storage
Firms also need to consider whether they have the datacentre space, power and components required.
Specialist AI hardware, especially graphics processing units (GPUs), is expensive and can be hard to obtain. Hyperscalers and their cloud AI customers have access to GPUs in volume. “The demand for [GPU] chipsets outstrips supply,” says KPMG’s Gasparini. “Therefore, there’s very little left for corporates to consume.”
Enterprises might need to look at less resource-intensive models for on-premise AI implementations, which can run on existing hardware.
But there are also efficiency arguments for doing so. In the inference stage, AI models might well be running constantly, making them more economical to run in-house, provided enterprises have the datacentre capacity.
“Putting things back into the datacentre is a good thing to do from a cost profile, particularly if they [the models] are going to be running all the time,” suggests NetApp’s Caley. “If you’re only going to spin up a bunch of GPUs for 10 hours to do a project, then maybe the cloud is better for that.”
Pure Storage’s Smith agrees. “It’s cheap to fail in the cloud, but it’s expensive to succeed,” he says. “Do your prototyping there – you can throw it all away if it doesn’t go according to plan. But when you roll it into production, because you’ve proven your ROI, you’ve proven that it’s a valuable business service, and you want to be focused on the potential costs.”
That, ultimately, is likely to prompt organisations to find AI models that will work with the IT infrastructure they have or can afford to build, rather than rely on the cloud for their longer-term AI strategy.
Read more articles about on-premise AI
- On-premises storage, infrastructure evolves to keep pace with AI: Pure Storage is the latest infrastructure vendor to add Nvidia DGX SuperPod certification and new product offerings to support generative AI workloads.
- Why the AI era requires an intelligent data infrastructure: NetApp aims to build infrastructure to 'bridge the chasm' that exists between cloud-based AI models and an organization's on-premises data management environment.