Machine Learning brains need a training model

Machine Learning (ML) doesn’t just learn, it needs training… and there’s no ML training without a ML training model.

The so who’s doing the training and who’s building the models?

Enterprise AI specialist Indico has a new open source project focused on enhancing the performance of machine learning for Natural Language Processing (NLP).

The Finetune project offers users a general-purpose language model for different tasks involved in text and document-based workflows.

‘Finetuning’ is a specific type of transfer learning designed to take a model trained on one task and adapt it to solve a different, but related task.

Users can make small modifications to repurpose an existing model to solve a new, related problem.

“Most organisations have NLP problems, but few have the labeled data they need to solve them with machine learning,” said Madison May, Indico machine learning architect and cofounder. “Finetune lets them do more with less labeled training data.”

The Finetune project extends original research and development work completed by OpenAI to address other problems.

OpenAI’s base project provides an illustrative model which is supposed to increase the accuracy and performance of machine learning models with natural language content and includes general capabilities for document classification, comparison and multiple-choice question answering.

The Finetune library packages that capability up and supports additional tasks such as document annotation, regression, and multi-label classification.  

Indico delivers Finetune in a format that mimics a popular open source repository – scikit-learn – and documents it so users are able to write as little as five lines of code (vs. 200) to try out OpenAI’s research on their own data problems.

The Indico team is conducting empirical research to evaluate how the models behave on different datasets and machine learning tasks. The company also plans to incorporate Finetune into its commercial product to address specific customer use cases.