Automated Machine Learning: the promises and the pitfalls

This is a guest post for the Computer Weekly Developer Network written by Paul Clough in his role as data scientist at UK-headquartered business analytics company Peak Indicators — Clough is also Professor of Search & Analytics at the University of Sheffield.

Professor Clough argues that the adoption of Machine Learning (ML) and Artificial Intelligence (AI) technologies in business is increasingly common and so it is now being used to support mainstream activities to improve processes, decision-making and provision of new services.

By way of a reminder, Machine Learning (or ML) refers to software programs that automatically improve their outputs as a result of ‘observed experience’ (i.e. exposure to dataflows and datasets).

ML itself is now argued to be core to areas ranging from predictive and prescriptive analytics to digital task automation and hype around ‘augmented analytics’, Enterprise AI and so-called Analytics 4.0. So what should we think going forward?

Clough writes as follows…

The ML space is in somewhat of a state of flux and significant barriers face organisations, especially employees with a lack of expert ML knowledge (despite the rise of so-called ‘citizen data scientists’).

Often users of ML tools have to make choices – how should data be processed, which features should be used for Machine Learning, which algorithms should be selected, how should models be tuned and refined, how should models be deployed – and frankly it can be overwhelming.

To a novice, despite maybe possessing strong analytical capabilities, the entire ML process can be daunting and businesses end up with under-performing… and in the worst case, incorrect models.

Automated Machine Learning

Not to fear, though, as Machine Learning is coming to the rescue of Machine Learning! Increasingly, stages of the ML pipeline are becoming automated through the use of ML techniques, giving rise to Automated Machine Learning (or AutoML) tools, both commercial (e.g., DataRobot, Dataiku DSS, Google Cloud HyperTune) and open source (e.g. Auto-WEKA, autosklearn, H2O, TransmorgrifAI, and TPOT).

However, despite the hype around automated ML, it has actually been around for at least two decades, because it began as automated predictive modelling.

As the name suggests, AutoML tools help to automate stages of Machine Learning, which typically follows a process of data preparation (normalisation, transformation and scaling, feature extraction and feature engineering), model building and training (model testing, model selection, hyper-parameter tuning, model validation) and model deployment.

Hyper-parameter tuning

Originally, AutoML tools automated the processes of model selection and hyper-parameter tuning, which often requires searching through huge numbers of possible settings to derive the best performing models (i.e. an optimisation problem).

However, automation is becoming more widespread throughout stages of the ML process… and indeed the wider analytical and data science processes, such as the cleaning of data, algorithm tuning and selection across multiple models and the deployment and maintenance of models. This enables greater use of autonomy in business processes, as well as the execution of the ML pipeline.

The result is twofold:

  • For the non-expert, who may have good business understanding but limited ML knowledge, the use of AutoML tools can help to reduce technical barriers, guiding them through the ML process and thereby opening up new opportunities.
  • On the other hand, for the expert being able to automate aspects of the ML process, many of which can be laborious and time-consuming, is welcomed and frees up their time to focus on other areas, such as interpreting the outputs of the ML process, communicating insights, and carrying out richer forms of analytical work, for example identifying new opportunities to apply ML and AI.

Despite the benefits afforded by AutoML, it is worth pointing out some of the challenges and pitfalls.

Challenges and pitfalls

It is important to understand that the use of automation is not a simple replacement for Machine Learning expertise; rather the tools are able to support the data workers.

Data and algorithmic literacy is a key problem facing organisations that does not go away with AutoML tools: users still need a basic understanding and awareness of the ML process and techniques, and senior management need to understand what AutoML can, and cannot, do.

Despite the marketing of AutoML, it will not fully automate the end-to-end process of data to insight (and action) as is often suggested; some aspects will remain largely manual and require human intervention.

Prof Clough: Machine Learning is coming to the rescue of Machine Learning!