Data virtualisation on rise as ETL alternative for data integration
The Phone House and Novartis have turned to data virtualisation from Denodo and Composite to gain a single logical view of disparate data sources.
Data virtualisation is emerging as a possible technique for businesses to use in tying together disparate databases to become more agile in both their business operations and their data integration processes.
Traditionally, companies have relied on data integration technologies, such as extract, transform and load (ETL) tools, to pull data from transactional systems and populate data warehouses for business intelligence (BI) and analytics uses. But for applications that require real- or near-real-time decision making, getting critical business insight out of an ETL-fed data warehouse can seem as effective as sending Lewis Hamilton out to qualify for a Grand Prix in an Alfa Romeo Series 1 Spider. The iconic 1960s roadster is a lovely machine, but one that’s likely to fall far short of Formula One’s uncompromising need for speed.
Another challenge for ETL processes is the increasingly large number of data sources that organizations are looking to tap. Such pressures are encapsulated by the pharmaceutical industry. Every year, billions of pounds are poured into research and development efforts, with companies hungering to create new and improved drugs that can provide booster shots to their businesses. Data is the lifeblood of pharmaceutical makers -- and there is no dearth of it for them to analyse.
As Fatma Oezdemir-Zaech, a knowledge engineering consultant at Switzerland-based Novartis Pharma, explained, her IT team serves a research department that needs to pull data from a huge variety of sources. That may include troves of trial research from medical publishers or commercial data sources, along with an abundance of data from internal systems. “Our team has extensive experience and skills in using ETL, and there are procedures that can be done in a semi-autonomous way,” said Oezdemir-Zaech. “But the more data sources we used, the more time it was taking to get the data in the format we want.”
Traditional data warehouses haven’t become redundant, said Gary Baverstock, UK regional director at data virtualisation vendor Denodo Technologies. But as the pressure for real-time insight and increased business agility intensifies, and companies increasingly look to utilise external data sources, many IT chiefs are seeking alternative ways to deliver data to business users. See ETL vs ELT for an example.
Data virtualisation keeps data in its place
One option is data virtualisation, which provides a layer of abstraction that can sit atop enterprise applications, data warehouses, transaction databases, Web portals and other data sources, enabling companies to pull together data from different systems without having to create and store new copies of the information. That eliminates the need to replicate data or move it from source systems, reducing IT workloads as well as the risk of introducing data errors.
The technology also supports the writing of transaction data updates back to the source systems. This, proponents say, is one of the clear benefits that set data virtualisation apart from data federation and enterprise information integration (EII), two earlier techniques with similar aims of making it easier to analyse data from a disparate array of sources.
While the three share some capabilities and are sometimes viewed as the same thing under different names, EII technology offered a read-only approach to data querying and reporting, said Brian Hopkins, a US-based analyst with Forrester Research.
Data federation emerged more than a decade ago and was meant to do away with ETL tools, data staging areas and the need to create new data marts. But critics say its initial promise masked key weaknesses: Data federation software was ill-suited to very large data sets or environments requiring complex data transformations. Worse still, it was, in the minds of many, intimately linked to the world of service-oriented architecture (SOA).
“There were a lot of good things associated with SOA, such as the efforts to drive complexity from organizations’ IT infrastructure, break down the information silos and untangle the spaghetti diagram of IT architecture,” said Baverstock. “But as the economic winds shifted, these tremendously complex IT projects fell out of favour, as businesses focused on those efforts that would bring quick wins.”
More on data virtualisation
- Get this quick definition of data virtualisation
- ETL vs ELT? Understand the differences between them
- Learn how HealthNow New York plumped for Informatica over IBM and Composite
- Data virtualisation authority Rick van der Lans explains how the technology will impact data governance
Retailer looks to drive out data errors
The Phone House -- the trading name for the European operations of UK-based mobile phone retail chain Carphone Warehouse -- implemented Denodo’s data virtualisation technology between its Spanish subsidiary’s transactional systems and the Web-based systems of mobile operators because of the dual read-and-write capability supported by the tools, said David Garcia Hernando, business exchange manager for The Phone House Spain.
The retailer acts as an intermediary between its customers and the mobile operators. But, Hernando said, Phone House’s sales staff had to enter customer data into the company’s internal systems and then rekey it into the mobile operators’ systems because the different applications could not talk to each other.
“Whenever you have manual data entry, you're going to create errors,” said Hernando. “We'd have customer records that didn't match those held by the operators, and that was costing us money.”
And with approximately 1.5 million transactions processed each year in Spain, cutting the data entry time in half was a huge productivity boon for the retailer’s sales teams.
While there were simpler ways to achieve the integration, Hernando knew that the data virtualisation tools could provide other benefits, too. “Our invoicing system and CRM systems are pretty good, but they're 20 years old, so it can be tough when you want to introduce new things quickly,” he said. “But thanks to the Denodo technology, we can create new reports wanted by the business really quickly.”
Phone House’s data virtualisation experience is typical of many of the implementations Forrester sees. “Most organizations get into data virtualisation for tactical reasons, but once that's done they find that the benefits of not having to physically move the data around for integration has much wider use cases,” said Hopkins.
Data virtualisation: no limits?
It's a similar tale at Novartis, which implemented a data virtualisation tool from Composite Software to enable its researchers to quickly combine data from both internal and external sources into a searchable virtual data store. “Our particular challenge was taking vast column-based biological data sets from external sources and integrating that with our own Oracle database,” said Oezdemir-Zaech. “But Composite built us a proof of concept within three days. Once we were able to get easy access to all those data sources, the idea really took hold.”
She added that with data virtualisation, “there are no limitations -- it doesn't matter whether the data sets were huge or tiny. For us, that's really important.”
Hitherto, organizations may have been tempted to make their data easier to manage by undergoing a database consolidation programme. That has some obvious advantages, Hopkins said. “But it is a massive undertaking,” he warned. “It’s hard enough for structured data, never mind the morass of unstructured data swirling around the enterprise. Data virtualisation promises to deliver some of the same benefits -- most obviously, the ease of analysing data -- without the burden of massive data and application integration.”
Such benefits, combined with the belief that tactical data virtualisation projects will give rise to more strategic programmes designed to treat data as a utility-like service, lead Forrester to predict that the demand for data virtualisation is set to boom. It anticipates that organizations will spend $8 billion globally on data virtualisation licences, maintenance and services by 2014.
Still, even data virtualisation vendors acknowledge that the technology isn’t the answer to all data integration questions. “Data virtualisation is not the apogee of information management that means you can do away with all the other tools you've relied on over the years,” said Ash Parikh, director of product management at Informatica. “It's like a Swiss Army knife -- this is just one of the tools to get the job done.”