Getty Images

Evolving a data integration strategy

Linking IT systems together is never going away, but the approach CIOs adopt is key to ensuring relevant data feeds the decision-making process

The idea of integrating systems is nothing new. IT departments have constantly struggled to link enterprise applications together in a way that enables decision-makers to have the data they need at their fingertips. Ideally, this information should all be pulled into a single dashboard. But as enterprise IT is constantly evolving, each new application provides a new data source. There is often a lag from the time an application was deployed and when IT could fully integrate it into an enterprise dashboard.

Today’s IT is exponentially more complex. There are the internal applications, which have remained a constant integration headache for CIOs. Then there are the cloud-based deployments and software as a service (SaaS), which means enterprise data now resides in two totally separate spheres. Add in connectivity to business partners and third-party sources, and it soon becomes clear that IT is facing an uphill struggle in dealing with numerous silos of enterprise data in a hybrid world.

Then there is the added complexity that arises with multicloud and hybrid cloud deployments. In fact, nearly half of the 1,700 UK IT decision-makers surveyed by Vanson Bourne in a study for Nutanix identified integrating data across different environments (49%) as their top challenge in multicloud deployment.

These disparate data sources need to be pulled into decision support systems. But moving beyond traditional business intelligence, data integration also has an essential role in advanced analytics, artificial intelligence (AI) and machine learning (ML).

“There is also the non-trivial matter of data governance in a hybrid world,” says Freeform Dynamics distinguished analyst Tony Lock, “especially one where cloud providers offer advanced machine learning and analysis tools that can operate on huge volumes of data coming from multiple sources. Any analysis that includes information from diverse data sources means you must have effective data governance in place.”

Data integration as a stepping stone towards AI

Humans require a lot of information to make sense of the world, so current more primitive computer algorithms need far more data, says computer expert Junade Ali.

While artificial intelligence (AI) and machine learning (ML) algorithms are getting ever better at doing more with less, we still often need to bring together data from multiple sources for them to produce results that make sense. Humans require a lot of information to make sense of the world, so our current more primitive computer algorithms surely need far more.

A challenge in ML is the ability of algorithms to understand causality. So far, much of what is done by AI algorithms is finding correlations between data points, as opposed to understanding causal relationships. Improving causal reasoning in AI offers the opportunity for us to do more with less when it comes to data. Microsoft Research is one team that has a group currently working on improving “causality in machine learning”, but there is still more work to be done. 

Until such a time when we overcome these challenges in AI, data integration will remain an important part of ensuring we can give our constrained ML algorithms the data they need to provide meaningful outputs. It isn’t just about the volume of data, but also the dimensionality, ML algorithms need a full understanding of all data attributes to have a better chance of finding the right conclusions. For this reason, before embarking on your AI revolution, you must ensure your ducks are in order when it comes to your data.


Junade Ali is an experienced technologist with an interest in software engineering management, computer security research and distributed systems.

In addition to a lack of sufficient data governance, poorly integrated data leads to poor customer service. “In the digital economy, the customer expects you to know and have ready insight into every transaction and interaction they have had with the organisation,” says Tibco CIO Rani Johnson. “If a portion of a customer’s experience is locked in a silo, then the customer suffers a poor experience and is likely to churn to another provider.”

Breaking down such silos of data requires business change. “Building end-to-end data management requires organisational changes,” says Nicolas Forgues, former chief technology officer (CTO) at Carrefour, who is now CTO at consulting firm Veltys. “You need to train both internal and external staff to fulfil the data mission for the company.”

Businesses risk missing the bigger picture, in terms of spotting trends or identifying indicators of changes, if they lack a business-wide approach to data management and a strategy for integrating silos.

In Johnson’s experience, one of the reasons for poor visibility of data is that business functions and enterprise applications are often decentralised. While the adoption of software as a service has increased connectivity, conversely it has also enabled more data silos to silently grow, she says.

“To get the most from SaaS, organisations need integrated data that draws information from across the organisation, interconnecting with application programming interfaces (API) within the organisation and its partner network,” says Johnson. “Without an integrated data strategy, organisations fail to maximise the opportunity of SaaS but also pose a risk to the organisation through unidentified weaknesses in data security and privacy. These will lead to a reduction in data quality and, therefore, business confidence in the data.”

Technology legacy

Given that the volume and use of data is ever-changing, IT infrastructure to support data access built just a few years ago is no longer suitable for modern, data-intensive use cases.

Analyst Forrester recently identified three such stalwarts of the data integration stack that CIOs need to start decommissioning. In The Forrester tech tide: Enterprise data integration, Q4 2021, the analyst firm recommends that IT leaders consider divesting their existing enterprise information integration (EII), enterprise service bus (ESB), and traditional/batch extract, transform and load (ETL) tools.

While EII has been a key data integration technology for decades, Forrester’s research found that it has failed to meet new business requirements for real-time integration, semi-structured data, cloud services integration and support for self-service capabilities. According to Forrester, software providers of EII are now repositioning their offerings towards the emerging data virtualisation, data fabric and data services markets.

It’s a similar story with ESB. Forrester notes that enterprises are moving away from ESB technology to new offerings based around an integrated platform as a service (iPaaS), data fabrics and data virtualisation.

With ETL, access to legacy data is still an issue many organisations face. But the batch movement of this data using ETL is no longer aligned to business requirements. In the report, Forrester analysts note that demand for real-time streaming, increased data volume, support for hybrid and multicloud platforms, and new data sources have greatly hindered the technology’s growth.

“Because most new analytical deployments leverage the public cloud, new cloud data pipeline/streaming tools will become the standard ETL tool,” the Forrester analysts predict.

Taking the lead in data integration

Looking at a modern data pipeline, Alex Housley, CEO and founder of machine learning operations (MLOps) specialist Seldon, says data scientists spend much of their time cleaning data. A data engineer’s job involves pulling data in from different sources to create live datastreams to update a central data lake to avoid manual ETL.

To achieve this, he says, “we are seeing increasing adoption of [Apache] Kafka for streaming data”. Once in a central place, Housley says data citizens in business units can then consume the data.

Building on the idea of having a data pipeline for data citizens, Tibco’s Johnson believes data integration needs to be part of the day-to-day business required to run enterprise IT, rather than as a one-off project.

“For data integration to succeed, it has to be a constant programme of business improvement that is iterative,” she says. “This approach to data integration thrives on balanced expectations, long-term resource commitment, regular reprioritisation, agile business partnership, and continuous learning and improvement. Data integration will therefore be business-critical and critical to the success of CIOs and their transformation agenda.”

Johnson says data integration relies on the key strengths of the IT team and the CIO. In her experience, the IT department and the CIO have a 360-degree view both of the customer and the health of the business, which means a CIO can provide a balanced and unbiased view of the entire business.

“In my experience, this is a broad viewpoint, just like finance, as the IT team sees the big picture of the entire business, whilst our peers in other business lines have near-term targets and pressures that inevitably, and necessarily, shape their view to be more personal,” she says.

Read more about data integration

  • Integrating data is one of the thorniest challenges in business intelligence and analytics – achieving it is technically and organisationally complex, while it is gaining in importance all the time.
  • To accelerate their performance in data integration, companies are evaluating and adopting a range of contributing technologies.

Read more on Big data analytics