cam_pine - stock.adobe.com

Cloudflare eyes GenAI workloads with Workers AI

Cloudflare’s Workers developer platform is touted to make it easier for organisations to deploy GenAI capabilities at the edge to speed up inferencing

Cloudflare’s Workers developer platform had enabled it to build new capabilities quickly, closing the gap with some of its closest rivals. Now, it’s hoping the same platform will do the same for organisations that want to deploy generative AI (GenAI) capabilities without worrying about the underlying infrastructure.

In September 2023, the company launched Workers AI, an AI inference-as-a-service platform that enables organisations to run AI models at edge locations with minimal coding requirements, powered by its global network of graphics processing units (GPUs).

Ricky Robinett, Cloudflare’s vice-president of developer relations and community, said the service caters to AI workloads that are too big to run on a device, but not big enough to be deployed on a server farm in the cloud.

“One use case we’ve seen is something like a news or content site that recommends other content for you to read with a linked summary of the content,” Robinett told Computer Weekly on a recent visit to Singapore.

“You can also take something like Stable Diffusion and put that closer to the user and decrease the latency.” 

Cloudflare has made pretrained models such as Meta’s Llama 2, Mistral AI’s Mistral 7B, OpenAI’s Whisper and Hugging Face’s distilbert-sst-2-int8 available through Workers AI. The company plans to expand this list and hasn’t ruled out allowing customers to run their own models on the platform.

“Mistral is great, but sometimes you don’t need a whole large language model from a cost and performance perspective,” said Robinette. “Sometimes you have a task that is very specific and so we have folks asking us about running a very narrow model close to their users so it can be really quick.”

Read more about AI in APAC

Workers AI maintains privacy by default, which means the models are not trained on customer data. For the results from the models to be meaningful and useful to users, Cloudflare has developed a vector database called Vectorize that organisations can use to store and generate embeddings for user questions.

To improve the resiliency, scalability and performance of their AI applications, developers can also leverage Cloudflare’s AI Gateway, which helps prevent data loss and enables fallback to an alternative model to manage costs and address rate limits.

“For example, if you’re using GPT-4, but it isn’t working, or maybe you got rate-limited, you can fall back to Anthropic and have this workflow that lets developers move along as they need with a simple proxy,” said Robinett.

The GPUs employed by Workers AI have been deployed across 100 points-of-presence in Cloudflare’s network and will be available across all of its sites by the end of 2024.

Robinett said the demand for the service has been fuelled by GPU shortages, with developers looking at running inferencing workloads in a more cost-effective way. Cloudflare had foreseen this demand when initially building its infrastructure.

“When we put in our servers, we left an extra space for the GPUs for years,” he said. “The nice thing is we didn’t have to rebuild all our servers to put in the GPUs – we just had to open them up and there was already a slot ready for us.”

Read more on Artificial intelligence, automation and robotics