michelangelus - Fotolia

Comparethemarket uses Cloudera to gain single view of customer data

Price comparison site Comparethemarket has turned to Cloudera’s Hadoop distribution to get a better view of customer data from different channels. Doug Cutting, the father of Hadoop, explains

This article can also be found in the Premium Editorial Download: Computer Weekly: Will digital government remain on course?

Comparethemarket.com is using Cloudera’s Hadoop distribution to gain a single view of customer data from different places.

Doug Cutting, the founder of Hadoop and chief architect at Cloudera, said he doubted the UK price comparison website could have existed in the form it does before the advent of big data technologies.

“They wanted a single system where they could integrate all the data they have and get a better view of the customer that is up to date. And that will survive in that state as they incorporate new sources, new lines of business, new markets to compare, and understand new things about the businesses they are in. It gives them a lot of flexibility,” said Cutting.

He said there is no company exactly the same as Comparethemarket in the US: “There is Priceline, but that is more product than service oriented. You know, the US gets more credit than we deserve for innovation.”

Cloudera Data Hub clusters were installed at Comparethemarket in June 2014 and have been running Cloudera Manager for the past 10 months.

“Comparethemarket.com can now analyse and use the vast amounts of data it creates across all channels to deliver a comprehensive customer view across multiple channels in near real time,” said Cutting. “This translates to a more immersive, relevant and engaging experience than ever before.”

Cutting is well known as the founder of Hadoop, at Yahoo, where he and his colleagues took the MapReduce idea from Google and applied it more widely.

He said the ComparetheMarket instance is a good application of Hadoop, and typical of his company’s customers: “The common theme [among Cloudera customers] is solving a problem, namely that the older, more static approach of doing a lot of analysis upfront, transforming the data according to a standard schema, then running it forever is no longer appropriate.

Read about more Hadoop use cases

  • Video: Hadoop in use at MUFG Union Bank in San Francisco.
  • E-commerce website TrueCar builds a data-driven business with the help of a Hadoop cluster and data analytics tools.
  • Building and running enterprise Hadoop applications takes more than data crunching. First, Hadoop data must be absorbed into company processes, says Western Union IT manager.

“The Hadoop approach is to take all your data sources, stream and store them, then start analysing. You may need to transform them according to a common representation or you may not. And if you do need to re-transform them you can do that easily because you have retained the original.

“Instead of having this one general-purpose tool of SQL to analyse – the one-hammer approach – you now have a wider variety of tools, like Impala, Hive, Solr, Spark and so on.”

Cutting said the Comparethemarket example of the value to be got from Hadoop goes beyond the cost savings that come from its use of commodity hardware. “That is still common, the first foot in the door. Other benefits, such as the ability to integrate multiple data sources and innovate applications more rapidly can take longer so see.”

He said Cloudera has banking customers with lots of lines of business, multiple retail banks that might be a result of acquisition. And even those retail banks will have different offers with separate back-end systems for those. So to get one overall view of their customer is very valuable. And to have a platform to build and maintain that as their business changes brings a benefit right away.

“Another example is investment banks wanting to get a better handle on their exposure to risk, globally. That can be very difficult, but might be required for compliance,” he said.

“A classic approach to these problems can be very inflexible. Having an approach where you save all the data in its raw form and then start combining it is resilient to change.

“We never get things right the first time. The more you have a platform that lets you rapidly go back, and has the capacity to process all the data quickly and reprocess it as needed, the better – it is then not a big deal.”

In a statement, James Lomas, IT director at Comparethemarket.com, said: “We want to make it really easy for people to save time and money across car, travel, home, life insurance and their energy bills – and data is key in achieving this.

“How to get all the disparate data into one manageable location and then gain value from the data are two very common problems. We also wanted to radically speed up the time in which we could translate our learnings into benefits for our customers,” he said.

Read more on Master data management (MDM) and integration