LLM series - Chainguard: Why developer 'trust' in AI images matters (a lot)

This is a guest post for the Computer Weekly Developer Network written by Dan Lorenc, CEO and co-founder at software supply chain security company, Chainguard. 

As a man who co-created the open source software signing project, Sigstore, while at Google, as well as the SLSA security framework, Loren sees AI Images as one of the most compelling problem domains for software supply chain security in 2024.

CEO Lorenc writes in full as follows…

Secure AI vs. the race to weaponise

Attackers will always go after the lowest hanging fruit and the explosion of AI software packages and LLMs has created large-scale typosquatting attack opportunities.

These attacks work by mimicking AI images and software packages, then creating a ‘DOS’ for developers as they have to filter through lots of noise. 

Developer authenticity, verified through signed commits and packages and getting open source artifacts from a source or vendor you can trust are the only real long term prevention mechanisms on these Sybil-style attacks on AI artifacts. AI will be used by attackers to stand up more convincing typo-squatting repos and automate the expansion of these fake AI software artifacts, while developers try to use AI to scale the discovery of security bugs and CVEs.

But security teams will find that AI simultaneously finds a lot of poorly vetted CVEs that ‘DOS’ the NVD / triage teams – it’s the AI security equivalent of the ‘noisy pager’ syndrome, where it’s difficult to parse legitimate vulnerabilities from noise and creates a lot of thrashing for security teams that are aiming to reduce known vulnerabilities to zero.

Ultimately, this signal vs. noise challenge will accelerate the use of hardened, minimal container images for AI, reducing the volume of exploitable packages and making it easier for security teams to reason with the turf they are protecting and for developer teams to build AI- driven software securely from the start.

Clean base images 

Clean base images will now become basic AI security hygiene, but why?

Recent exploits like PoisonGPT demonstrated how new attack techniques could exploit the recursive dependencies in AI images for popular frameworks like Tensorflow, PyTorch, Kubeflow and others.

Chainguard CEO Lorenc: In 2024, we’ll see more LLMs being selected based on trustworthiness.

When developers install a base image, they are trusting not only the people they download the software from but also the security of the dependencies that software uses. What this means for developers is a new scrutiny on the cruft that often ships with AI images and purposefully choosing images that ship with the desired AI libraries and functionality and no additional transitive dependencies on software cruft. This basic AI security hygiene of reducing the base image to the desired components and nothing beyond, removes whole classes of recursive dependencies that can be backdoored to gain access to the massive datasets that organisations are using to train their AI models.

LLM provenance

How do we establish the trustworthiness of AI systems themselves? Cryptographic signatures, trusted computing and AI systems running on trusted hardware will all be contributing factors for how LLMs can expose more security transparency.

The end game needs to be a way for developers to track models in transparency logs – tamper-proof records of ‘provenance’ (aka, chains of custody), including records of which training model was used, who created it, how it was trained and who had access to it.

In 2024 we’ll start to see more LLMs being selected based on their trustworthiness and these types of verifiable provenance records will quickly become the trust mechanisms.