ShpilbergStudios - stock.adobe.c
LLMs explained: A developer’s guide to getting started
A guide to help enterprise developers use large language models securely, efficiently and cost-effectively in their applications
Since large language models (LLMs) and generative AI (GenAI) are increasingly being embedded into enterprise software, barriers to entry – in terms of how a developer can get started – have almost been removed.
There are plenty of off-the-shelf products, such as the various Microsoft Copilot offerings, that target business user productivity. For software developers, Microsoft also has Github Copilot, which is designed to speed up coding by auto-completing and offering prompts to help developers write code more quickly.
Access through application programming interfaces (APIs) to public cloud-based services such as ChatGPT enable developers to incorporate powerful AI chatbots into their own applications. Those developers whose organisations are customers of modern enterprise software such as products from Salesforce, Workday, Oracle or SAP, among others, will also have access to enterprise AI capabilities powered by LLMs.
The only caveats are data privacy and intellectual property protection. While a developer can easily get started trying out the tools that are readily available on the public cloud, effective training requires high-quality, domain-specific data.
There are troves of such datasets in corporate data warehouses, but to prevent data leakage, no corporate data should ever be transferred out onto a public LLM unless the developer has been authorised to make such data public.
Developers should also be wary of using personally identifiable information with LLMs as moving such data into an LLM for training could breach data privacy regulations. The best advice is to ensure the data required for training and testing is compliant with corporate data policies.
That’s why there’s a lot of interest in organisations building their own private LLMs. In practice, such systems work best if they can combine the vast amount of information that can be gleaned from public LLMs with commercially sensitive and proprietary data help in enterprise IT systems.
How to get started with LLMs
There are a number of LLMs with easy to access APIs that developers can harness to start building AI-infused applications. Developers need to decide whether to use an open LLM or one that is proprietary.
Proprietary API-accessible models are generally licensed based on usage, and the developer simply signs up to a subscription based on their usage requirements. Usage is measured and priced in what the industry calls “tokens”, based on the volume of text sent or received by the LLM. This means costs can increase rapidly if they are used extensively, but according to Ilkka Turunen, field chief technology officer (CTO) at Sonatype, the calculations for these requests are not always straightforward, and an intimate understanding of the payload is required.
Open models are generally much less expensive in the long term than proprietary LLMs because no licensing fees are involved. But developers looking at open source models also need to take into account the costs involved in training and running them on public clouds or using on-premise datacentre servers that are optimised for AI workloads.
Open models include LLaMA2 from Meta, Bert from Google and Falcon-40B from the Technology Innovation Institute in Abu Dhabi. There are a large number of open models available, and to help developers understand a bit more about their benefits and drawbacks, Hugging Spaces has created a leaderboard of open source LLMs that uses the Eleuther AI Language Model Evaluation Harness unified framework to test generative language models.
What hardware is needed for LLM training
LLMs require significant computing resources. For instance, in 2023, Sharada Yeluri, technologist and senior director of engineering at Juniper Networks, posted an article on LinkedIn which showed that with 2048 Nvidia A100 graphics processing units (GPUs), training LLaMA2 on a vocabulary of 32,000 words would take 21 days.
The leading PC server companies are all offering servers that are optimised for AI workloads. These servers are preconfigured as clusters with fast interconnects that link the GPUs efficiently to deliver scalable performance.
There are clearly some LLMs that will have better hardware utilisation, in terms of efficiency, over others. The Hugging Spaces leaderboard is one of the places developers can go when researching the IT resource requirements of different LLMs. There are others including an open collaboration on Github.
It’s also entirely feasible to run smaller models that are trained on less data and, as a consequence, require far less computational power. Some of these can be made to run on a reasonably high-performance laptop or desktop PC, configured with AI chips.
Common pitfalls to avoid
AI systems tend to be non-deterministic, which has implications on how decision-making AI systems are engineered and tested. If the data used in training is not complete, this will lead to biases and flawed assumptions when the AI system is presented with real-world data. Developers need to fine-tune data models, and tweak them with techniques like hyperparameter tuning and nuances to achieve optimal results.
LLMs rely on high-quality training data. If data is incomplete, inconsistent or missing certain demographics, it may produce flaws or biases in the answers they give.
LLMs can sometimes get confused. This phenomenon is known as hallucination.
Using LLMS with business intelligence
While public LLMs are trained on a huge amount of public data, they don’t have access to the inner workings of a business. An inference engine based on public data is likely to miss the nuances of a specific domain within the confines of an organisation and the information flows powering its business processes.
When used in decision-making systems, the developer also needs to consider the question of explainability, since proprietary LLMs are rather like black boxes, which makes it hard to decipher how the inference engine comes up with answers to an input question.
To avoid data leakage, many IT leaders ban or limit the use of public LLMs. The public data can be used in inference applications, but the outputs from the LLM need to be combined with company-specific information that resides in enterprise IT systems.
A sound information management strategy is key, with guardrails to ensure the consistency and integrity of data and to avoid data leakage. One place to start is the data stored in commercial off-the-shelf enterprise applications. Many of these software packages incorporate LLMs.
Oracle, for instance, is offering a way for its customers to use their own, private data to “fine-tune” public LLMs, delivering results that are specific to that organisation. The company has recently unveiled GenAI agents for Oracle Cloud Infrastructure. Vinod Mamtani, Oracle’s vice-president and general manager for GenAI services, said: “We don’t require customers to move their data outside the data store to access AI services. Instead, we bring AI technology to where our customers’ data resides.”
Rival SAP is also linking LLMs with enterprise data sources. The SAP Hana Cloud multimodal database includes a vector database engine, which allows organisations to combine the capabilities of LLMs with enterprise data to answer queries.
Juergen Mueller, CTO of SAP, said: “Large language models bring sparks of intelligence, but they also have severe limitations. They have no idea what happened in the past one or two years, and they have no access to any business data, so it’s hard to deploy them in production.”
Making the business case for developing with LLMs
According to analyst Forrester, one opportunity to use an LLM is for improving operational efficiency, such as in finance and accounting to reduce external auditing fees. Every chief financial officer wants to reduce external auditor billable hours. LLMs can answer auditor questions, and reduce the hours and internal staff required to gather the information.
Auditors also see a way to use LLMs to help them work more efficiently. PwC, for instance, has developed a tax AI assistant tool, which cross-references, and has been trained on case law, legislation and other underlying sources, together with its own UK-based IP.
According to PwC, the data is being regularly refreshed to reflect changes and updates to tax rules. It claims that the model generates significantly higher quality and accuracy in the tax domain when compared with publicly available LLMs, and provides references to underlying data, allowing for transparent and accurate validation by tax professionals.
Five things to consider when developing applications with large language models
- There are security risks and data privacy considerations. Is your corporate data security policy being infringed if you put data into a public LLM?
- Proprietary LLMSs are like black boxes, which makes it difficult to audit them for explainability Will the application you are developing require an audit trail that needs to understand how the LLM cam up with ins answers?
- AI systems tend to require huge amounts of computational resources. Will you need to buy AI-optimised hardware to train and run inference applications? What are the cost implications of using AI hardware in the public cloud?
- The power of these models means it is necessary to consider ethical questions on how they will be deployed.
- The copyright implications of public LLMs have yet to be resolved. Does your organisation have guardrails in place to ensure your intellectual property is not being infringed by public LLMs?