Commerzbank creates Hadoop-based platform for business-critical insights

German bank creates an entire new department to build and support a platform to provide business-critical insights

Commerzbank is a year into a project to create a centralised data platform that can be used by all departments to help them make better business decisions.

The project is already slashing the time taken to glean insights from the huge volumes of often complex data, and is enabling businesspeople to take advantage of things previously only available to highly technical experts.

Whether it is to help target a new product or service at customers, meet a regulatory change or detect fraud, understanding data makes it easier for businesses.

“The business need is to first harmonise the data of the bank, create a trusted data hub, and on top of this, create insights and analytics solutions that will drive new revenue sources or create cost efficiencies or loss prevention,” Kerem Tomak, head of big data and advanced analytics at Commerzbank, told Computer Weekly.

Understanding the data requires investment in multiple technologies and the right skills, said Tomak. The bank decided to create a new department dedicated to this challenge. “As part of its journey to digitise Commerzbank chose to establish a new team and department, known as big data and advanced analytics,” he said.

Tomak was hired about a year ago to build the unit from scratch. He previously worked in Silicon Valley, with 11 years’ experience at companies including Google and Yahoo, where he had a focus on analytics, the implementation of analytics and data driven product creation.

His work in Silicon Valley also involved building teams to work with big data and analytics technology, which he is now doing at Commerzbank.

Read more about Hadoop

  • Trying to calculate Hadoop cluster capacities isn’t always straightforward. It’s important for organisations to include IOPS and compression rates in their predictions.
  • Hadoop data lakes offer a new home for legacy data that still has analytical value. But there are different ways to convert the data for use in Hadoop depending on your analytics needs.
  • Social media giant plans to offload some of its Hadoop clusters to the Google Cloud Platform to boost the resiliency of its infrastructure.

Commerzbank is a full service bank, the second largest in Germany, with about 48,000 staff. It has a focus on serving the large, small and medium-sized business sector in Germany, known as the Mittelstand, which is made up of a large portion of the huge German manufacturing sector.

Tomak’s team does not fall under IT but, like IT, serves the entire business.

Past experience helps technology choice

Tomak’s time in Silicon Valley gave him experience with the kind of technologies banks are just beginning to look into and implement.

For example he selected the Hadoop open source environment to build the platform, as a result of his past experience with the technology. “I am kind of biased based on my background in Silicon Valley. I worked with Yahoo which created the environment that became Hadoop. It was an internal system at the time but then was made open source,” he said.

“Over the years, I have built Hadoop environments for other companies in Silicon Valley.”

But bringing such a platform to a business outside the tech sector involves making changes to the way it is used. To this end Commerzbank’s central platform uses multiple technologies on top of Hadoop to help users that are not tech experts.

“At Yahoo the work was very technical,” said Tomak. “You had to write your own processes and scripts to use the data.”

“It would take days or even weeks to put together a unified view of the data or a report that a line manager of executive needed. I was looking for fast and visual tools to enable us to do things much faster than writing code,” he said.

Refined data

This is where software from Trifacta came in. The self-service data preparation platform allows the people that best understand the data to take it from raw to refined. This is done by turning it into a user experience problem and using machine learning to automate a lot of the complicated work that used to require code.

The people that understand the business context are able to get in and work with big and complex data sets and blend and structure the information.

Enabling more than just the data scientists to use the system is vital as there is a wide range of potential users. For example, they range from the highly tech savvy data engineers and data scientists to business analysts, risk compliance officers, and fraud investigators.

The Commerzbank data platform provides a self-service for all the bank’s departments. “We are working to get people from across the bank to use the central platform to gather the insights they need,” said Tomak, adding that this can be ad hoc.

Recruitment challenge

But there is a major skills challenge when it comes to getting the right staff to meet the demands on Tomak’s team. There are currently about 50 people in the team, but this will increase to 100 by the end of this year.

This is not easy because the candidate profile needed is changing. “The skills are shifting from pure coding skills to really understanding the business context,” said Tomak. “If you really want to solve a business problem you have to understand the business context as well as the data.”

“These are the skills I am hiring now. But getting them is always competitive because every industry is looking for them,” he said.

More to come

While a lot has already been achieved through the centralise platform there is an opportunity to scale it up across the bank.

“We have the environment up and running, we have collected about 150 terabytes of data, we have Trifacta in the environment to check data quality and provide use cases for auditing and fraud detection, but there is more to come,” said Tomak.

Going forward, Commerzbank will continue to collect the data from different parts of the bank, harmonise this data, keep creating more automation and also create an engine on the data lake using tools to streamline processing of the data and creation of insights.

Read more on IT for financial services