In-memory for continuous (& simpler) deep learning Part 2
This is a guest post for the Computer Weekly Developer Network written by Nikita Ivanov, CTO and co-founder of GridGain Systems.
GridGain specialises in software and services for big data systems using in-memory computing techniques to increase data throughput and minimise latency for data-intensive applications across any type of data store.
A hard-core passionate developer at heart, Ivanov is an active member of the Java middleware community and a contributor to the Java specification
In this two-part series, Ivanov examines the role of in-memory computing in terms of its uses for continuous machine learning and, also, crucially… how it enables us to progress forward on the path to ‘simpler’ deep learning.
Ivanov writes….
In the first part of this two-part series, we looked at how in-memory computing (IMC) can deliver application speed and scalability by distributing processing across a cluster of commodity servers, which can be deployed on-premises, in a public or private cloud, or on a hybrid environment.
Data in an underlying database is synchronised with an in-memory data grid (IMDG), which processes transactions in memory and then writes them to the underlying database. This ensures data consistency and availability.
Using an IMDG enables companies to overcome the biggest challenge to real-time performance today: legacy architecture.
Most companies still rely on the traditional bifurcated OLTP and OLAP model that requires an extract, transform and load (ETL) process to periodically move data from an online transactional processing database to an online analytical processing database. The ETL process introduces significant delays that prevent real-time data analysis and action.
Translytical processing
The performance and scalability of the in-memory computing platform enables a unified architecture for transactions and analytics, which is referred to as hybrid transactional/analytical processing (HTAP), hybrid operational/analytical processing (HOAP), or translytical processing.
NOTE: According to the recent Forrester Wave report cited on InfoWorld the term ‘translytics’ is defined as: “A unified and integrated data platform that supports multi-workloads such as transactional, operational, and analytical simultaneously in real time, leveraging in-memory capabilities including support for SSD, flash, and DRAM and ensures full transactional integrity and data consistency.”
IMC-powered HTAP systems replace separate transactional and analytical infrastructures, eliminating the need for ETL.
IMC into deep (& continuous) learning
In-memory computing platforms have a unified architecture and do not have to move the data to a separate ML platform. This enables enterprises to run machine learning (ML) training in place on the operational data in memory… and means they can update their ML models more frequently. In-memory computing platforms can unify transactional and analytical processing.
Some IMC platforms also include integrated machine learning training features, which allow them to train models using the data in the system without moving any data to a separate ML training platform. By being able to update ML models in place, users can update their ML models more frequently and eliminate the cost of separate ML infrastructure.
An IMC platform can also be used as online storage for deep learning (DL) engines. DL algorithms are particularly compute-intensive, so companies running solutions like TensorFlow typically use GPUs on specialised frameworks and hardware.
Without an IMC platform, they must periodically ETL their operational data into a separate data repository which is often based on Hadoop. They then feed the data into TensorFlow for deep learning model training.
A company using an IMC platform that includes native integration with TensorFlow, however, can pre-process and feed their operational data directly into TensorFlow. This eliminates the cost and complexity of separate deep learning infrastructure.
A new normal?
The ability of in-memory computing platforms to support HTAP and deliver increased application performance and scalability makes them [I would argue, in my opinion] an essential technology for companies pursuing digital transformations.
Gartner defines “in-process HTAP” as a system that can make real-time decisions based on operational data and update business processes in real-time. Implementing an in-process HTAP application requires an in-memory computing with a continuous learning capability to achieve the performance and real-time machine learning model updates necessary for success.
Further here, Gartner predicts that at least 25% of large, global enterprises will adopt platforms that combine multiple in-memory computing technologies in order to reduce their in-memory computing infrastructure complexity. Further, it predicts that 75% of cloud-native application development will utilise in-memory computing or services powered by IMC to implement high-scale/high-performance applications.
Using a continuous learning capability integrated into an in-memory computing platform, companies can achieve the application speed and scalability they need to succeed with their digital transformation and omnichannel customer experience initiatives.
This architecture also potentially reduces infrastructure costs for ML and DL. IMC platforms can minimise the time between data ingestion, analysis, and business decisions driven by machine learning. This enables companies to deliver more relevant services to customers, drive real-time business decision making at the point of contact with end users, and much more.