AI developer toolset series: Aricent's 'big 4' Deep Learning challenges
The Computer Weekly Developer Network is in the engine room, covered in grease and looking for Artificial Intelligence (AI), Machines Leaning (ML) and Deep Learning (DL) tools for software application developers to use.
This post is part of a series which also runs as a main feature in Computer Weekly.
With so much AI, ML & DL power in development and so many new neural network brains to build for our applications, how should programmers ‘kit out’ their AI toolbox?
Aricent
The following text is written by Subhankar Pal, AVP of technology and innovation at global design and engineering company Aricent.
Pal analyses the big 4 Deep Learning challenges that AI/ML developers must overcome and writes as follows…
For many developers, one of the most challenging AI/ML areas to code has been Deep Learning. Primarily, to make Deep Learning systems learn from as few examples and data sets as possible – called one-shot learning.
Deep Learning is used to solve highly complex problems that have eluded computer scientists for decades. Now, thanks to increasing processing power, massive amounts of available data, and more advanced neural network algorithms, data scientists are closer to taming Deep Learning.
There are however challenges that need to be addressed.
1. Taming neural networks
The complexity of a neural network can be expressed through the number of parameters. In the case of deep neural networks, this number can be in the range of millions, tens of millions and in some cases even hundreds of millions. Let’s call this number P.
Since you want to be sure of the model’s ability to generalize, a good rule of thumb for the number of data points is at least P*P.
Neural networks are essentially developers’ “black boxes” providing valuable clues and researchers have a hard time understanding how they deduce conclusions. The lack of ability of neural networks to provide a reason on an abstract level makes it difficult to implement high-level cognitive functions. Also, their operation is largely invisible to humans, rendering them unsuitable for some domains where the verification of the process is important.
2. Optimal Hyperparameter Optimization
Hyperparameters is the value defined prior to the commencement of the learning process. Changing the value of such parameters by a small amount can invoke a large change in the performance of a model.
Relying on the default parameters and not performing Hyperparameter Optimization can have a significant impact on the model’s performance for Deep Learning. Also, having too few hyperparameters and hand tuning them rather than optimising through the proven methods can impact performance.
3. Lack of flexibility
Deep Learning is currently specialised for specific domains of applications.
Even solving very similar problems requires retraining and reassessment. Developers and researchers are working hard to develop Deep Learning models that can multitask without the need to rework and re-architecture. This is a major challenge.
4. The Python vs. R vs. Julia conundrum
The debate around the choice of Python vs. R vs. Julia has been heated. Using the wrong language can be sacrilegious for some. Python seems to be winning the battle as the preferred language for Machine Learning. R is preferred by traditional statisticians while Python is recommended for most developers. Languages like Julia are gaining popularity, but it is Python that has the best data science ecosystem.
If you are a Python developer, start with Scikit-Learn to build basic models before exploring advanced toolkits such as a Caffe2 and Keras. Most of these open source tools are meant for Deep Learning, which is an advanced technique of Machine Learning. The combination of Python and Scikit-Learn provides enough abstraction for developers to get started on their Machine Learning journey.