MLflow Project joins Linux Foundation

What’s better than a machine learning platform? Answer: an en-to-end machine learning platform, obviously.

What’s better than an end-to-end machine learning platform? Answer: an open source end-to-end machine learning platform, obviously, obviously.

What’s better than an open source end-to-end machine learning platform? Answer: an open source end-to-end machine learning platform that resides under the auspices of the Linux Foundation, obviously, obviously, obviously.

Okay enough of this, but this is what Databricks is hoping — the company has now said that its MLflow open source machine learning will join the Linux Foundation.

The project is two-years old in 2020 and has seen engagement from somewhere over 200 contributors.

It is downloaded more than 2 million times per month.

Moving to the Linux Foundation gives it a vendor neutral home with an open governance model, which is hoped to broaden adoption and contributions.

Databricks says it created MLflow in response to the complicated process of ML model development. Traditionally, the process to build, train, tune, deploy and manage machine models is quite tough for data scientists and developers.

“The steady increase in community engagement shows the commitment data teams have to building the machine learning platform of the future. The rate of adoption demonstrates the need for an open source approach to standardising the machine learning lifecycle,” said Michael Dolan, VP of strategic programs at the Linux Foundation. “Our experience in working with the largest open source projects in the world shows that an open governance model allows for faster innovation and adoption through broad industry contribution and consensus building.”

ML model work more complex than coding

Unlike traditional software development that is only concerned with versions of code, ML models need to also track versions of data sets, model parameters and algorithms, which creates an exponentially larger set of variables to track and manage.

In addition, ML is very iterative and relies on close collaboration between data teams and application teams.

Databricks claims that MLflow keeps this process from becoming overwhelming by providing a platform to manage the end-to-end ML development lifecycle from data preparation to production deployment, including experiment tracking, packaging code into reproducible runs, and model sharing and collaboration.

Matei Zaharia is the original creator of Apache Spark and creator of MLflow.

(Approved image source: MLflow)