API series - OctoML: ML APIs need to take a lesson from their ancestors

This is a guest post for the Computer Weekly Developer Network written by Jason Knight in his role as chief product officer at OctoML — the company is known for its work bringing DevOps agility to ML deployment to rnable developers and IT operations to build AI-powered apps.

Knight writes as follows…

We’ve all seen the amazing feats that modern machine learning is capable of. But these are merely the tip of the iceberg. The unsung heroes of machine learning models are the smaller models that make existing software work better – often much better – and enable small new experiences that were not previously possible.

But the difficulty of building intelligent applications with the combination of machine learning and traditional software engineering still relies on a large amount of pain, sweat and tears.

A large amount of this is due to the lack of stable, robust APIs for machine learning.

Bottlenecks & technical debt

The first mainstream deep learning machine learning frameworks such as Theano and Caffe were originally created to give data scientists APIs to define and then train models on example datasets. Deployment was often not even considered or left as an afterthought since these frameworks were written by and for the academic machine learning community.

Later, TensorFlow and then PyTorch increased the flexibility and capabilities available to data scientists and PyTorch’s close embrace of Python interpreter and language as the primary ML API enabled large steps forwards for ergonomics and flexibility for the data scientist.

But these benefits come at a cost. The Python-language-as-model-definition approach of PyTorch makes shipping models to other production stacks or devices challenging.

This contributes to the struggles of moving machine learning models from development to production. See this blog post recounting a brief history of PyTorch’s tradeoffs by creator Soumith Chintala, or the infamous and still highly applicable paper from Google, “Machine Learning: The High Interest Credit Card of Technical Debt” for a further dive into these tradeoffs and challenges.

In order to deal with the resulting complexities of coupling Python code with ML model definition, PyTorch code is often ‘thrown over the wall’ from development teams in organisations to ops or production teams which are responsible for porting, maintaining, or deploying this code onto production APIs and systems.

Does this sound familiar to anyone who did software development in the days before we had APIs to automatically test, provision and monitor deployed software?

Enabling ML Devs

To accelerate the advancement of ML to power the intelligent applications of tomorrow, we must make it easier for data scientists to deploy their own code by giving them tools for their development APIs to match production APIs, whether that be in the cloud or on the edge.

The only way to do that is by building better abstractions (APIs) and platforms that still retain the flexibility that developers enjoy today, but also enable hardware portability and performance without manual porting/integration effort.

We are beginning to see the initial signs of that with libraries and tools that better abstract complexity to empower users to do more with less. Examples of this include HuggingFace’s transformers library and BentoML.

We’re also seeing end to end machine learning platforms (AKA hosted ML API offerings). These platforms can be helpful to people just starting out in the space since they enable ML development APIs and ML hosted APIs to be more harmonious by construction, but it remains to be seen if these become the predominant way to do machine learning. One interesting historical data point to use for comparison is the world of classical SW engineering where we’ve seen mild success for end to end development platforms like Heroku, but in general, software development today is still done largely on a mixture of hosted and non hosted solutions that are combined together by teams in different ways.

Another possibility for how ML developer APIs will be brought into closer alignment with production APIs is through a rise of  foundational models – a smaller set of large, flexible models created by the community. These foundational models are distributed freely by a small set of sophisticated ML hubs and are then fine tuned or prompt engineered to a given end.

This might narrow ML engineering workflows enough to simplify the problem of aligning development APIs and production APIs. The distribution of flexible foundational model building blocks also has analogies to the rise in open source playbooks and APIs like the LAMP stack in early web programming, SQLite in embedded data storage, or MPI and then later Kubernetes APIs for distributed programming.

But only time will tell if consolidation around a smaller set of foundational models (hence workflows and APIs) will outpace diversification as ML continues to develop and specialise.

What can ML Devs do today?

For those of you building intelligent applications today, the name of the game is to try to avoid accidental complexity – as opposed to essential complexity – wherever possible. This is true for software engineering in general, but becomes even more important when adding ML to software because the essential complexity of ML itself means you have even less in your budget to squander.

In practice, this means adopting (and building) the minimum number of new ML tools, APIs platforms and services. And for those that you do adopt, ensure that they have bounded scope such as what the Unix philosophy of tools that serve single purposes and simple APIs teaches us.

Over time, our software development and deployment APIs will continue to blend together and expand enough to encompass ML just as they’ve grown to handle the past innovations in software development before them.