Yellow duck - stock.adobe.com
Interview: The importance of building a data foundation
We speak to Terren Peterson, Capital One’s vice-president of engineering, about how data pipelines and platforms are essential for AI success
Everything in IT gets commoditised eventually, according to Terren Peterson, vice-president of engineering at Capital One. Having worked at the bank for over 24 years, Peterson has first-hand experience of how commoditisation has impacted the business of IT.
“On IT projects, we used to spend so much time wrestling with IT infrastructure. But eventually, things went into the cloud. All that IT infrastructure stuff got commoditised,” he says.
Yet while IT infrastructure has indeed been commoditised and simplified, the data management challenges organisations face have only grown in complexity. “Things like data, pattern solving, and general problem solving and strategy still stay on the plate,” he says. “It’s easier to commoditise software and very easy, on some level, to commoditise infrastructure. But it’s hard to commoditise data.”
That said, Peterson is confident that, over time, fewer and fewer data management tasks will remain on the proverbial plate. “It’s a reductive exercise, where things get commoditised over time and get taken off the plate,” he says.
For Peterson, what is ever-present is the need for an enterprise data architecture. Such an architecture starts with engendering a data culture in the organisation, but he believes there is a bit of a disconnect: “You can have a data strategy, but do you have a data culture?”
A recent survey from Morning Consult for Capital One found that only 35% of the 4,000 people polled believe their organisation has a strong data culture. Meanwhile, over a fifth say their organisation lacks a strong data culture or that there is inconsistent leadership support, talent development and education around data.
“To a certain extent, a data culture is a proxy for discipline,” says Peterson. “Are you prioritising time for data management? Are you collecting good metadata? Are you doing data quality checks? Are you ensuring that the data is really well understood. And have you taken the time to standardise the data?”
Standardisation of data
In a company like Capital One, with multiple lines of business, having a standardised language is essential for effective communication across the whole company. This applies equally to data. As a data engineer, Peterson sees a significant role for data platforms in providing a standard for data across the organisation.
“We see platforms as central to data engineering,” he says. For instance, moving data from point A to point B requires a data pipeline. Having a data platform removes the need for everybody in the company to build their own data pipeline. Instead, the platform generalises the problem of moving data.
“I have all the data, that’s the magic. I think of AI, machine learning and other things [being done] with data analytics – this is where we want to put our creativity. We don’t want to put it into how to manage the data”
Terren Peterson, Capital One
In Peterson’s view, the idea of generalising the solution to a problem to make it applicable in other circumstances is a foundation of engineering. “When you generalise the problem, you go, ‘Wow, I can make a platform out of this’, then tell everybody in the company.”
The challenge, he admits, is convincing people to use the platform. “What you have to figure out is how to get your associates to direct their creative energy on the right problems, the right cases and the right parts of the problem,” he says.
There may well be hundreds of different ways to implement a data pipeline. Peterson’s advice is to try to direct people’s creativity away from areas that are served by IT. “If we want standardised data, please do not create a hundred different ways of doing the pipeline,” he says. “Use your creativity in finding all the different sources of data we can use and looking at the things we can do with the data.”
The idea of a data pipeline is rather like the enterprise service bus that gained prominence in the early 2000s when enterprises needed a standard way for applications to communicate. “The very first word is enterprise,” he says. “You really want one service bus – it’s the same thing for data platforms.”
The goal is to commoditise data standardisation, which enables people in the business to unleash their creativity. “I have all the data, that’s the magic. I think of AI [artificial intelligence], machine learning and all sorts of other things [being done] with data analytics, and this is where we want to put our creativity. We don’t want to put it into how to manage the data.”
Looking at how this has applied in his own work, Peterson says he didn’t put creative energy into trying to come up with his own data lake – it was already there. “I used my creativity to look at how I could get unique insight into the data attributes that were already there. That’s what really creative problem-solving in our businesses is like,” he says.
For instance, such problem-solving could lead to an improved fraud model or the development of a loan offer engine that helps people when they’re shopping for a new vehicle.
Growing a data platform
Looking at where to get started with a data platform, Peterson firmly believes organisations need to begin with a solid data foundation. Fortunately, many already have much of what they need as businesses have been processing data for a long time. “If it’s something you already have, then you can build upon that,” he says. “It gives you the starting point.”
However, using the analogy of a tree, he says: “When’s the best time to plant a tree? Well, it was 20 years ago. And if you didn’t plant the tree then, the second best time is to plant it today. I would encourage people not to think there’s a quick fix. You need the platform first. If you haven’t built that data platform already – if you didn’t plant the tree 20 years ago – you can start today.”
One example of using a standard data pipeline and data platform is the Capital One Auto Navigator, launched in 2023. This allows car dealers to connect with more car buyers and supports the vehicle purchasing process. Car buyers can customise details such as down payment, trade-in and term length to calculate a payment plan that works for them. This enables dealers to work more efficiently and accurately by giving them visibility into what a customer can afford based on available inventory.
An application such as Auto Navigator works by pulling together data from across the business. To take advantage of advanced analytics, AI and machine learning, Peterson says such applications require a solid data foundation and, implicitly, a data platform.
One of the takeaways from the conversation with Peterson is that organisations with a cloud-native IT strategy can quickly get a cloud-based data platform up and running because they do not have to worry about configuring the associated storage and other IT infrastructure needed in an on-premise environment.
Read more about data management
- Data management key to GenAI success: Deloitte survey shows business and IT leaders are optimistic, but academic researchers warn of artificial intelligence training time bomb.
- Why Salesforce needs a data management platform: There are reports that Salesforce is looking to acquire Informatica, but such a move needs to fit with its AI and GenAI strategy.