Big data best practice means a best-of-breed approach
This is a guest blogpost by Sebastian Darrington, UK MD at Exasol
The days of operating a single vendor IT software estate are behind us. Such is the pace of innovation and change, putting all your eggs in one basket simply won’t do.
Businesses need the ability to mix and match, leveraging the very best new and existing technologies for the task at hand. Nowhere is this more clearly the case than with big data, analytics and the cloud.
Data brings clear benefits to any business that harnesses it. From greater insight, more informed decision-making, to better efficiencies of operation and execution. Big data that is used correctly can unlock opportunities far beyond those achievable with individual silos of information. However, to achieve this requires the right tools, interoperability between systems and the organisational buy-in to use it properly. This is why the market for data warehouses alone is forecast to grow to $20 billion by 2022 according to Market Research Media, with the wider market for enterprise data management set to hit $100 billion by 2020, according to MarketsAndMarkets
Single vendor vs multi-vendor
A data warehouse is a system used for reporting and data analysis, and is considered a core component of business intelligence. According to Wikipedia, “DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place that are used for creating analytical reports for knowledge workers throughout the enterprise.”
The implication and indeed the precedent, has been that this meant a data warehouse implied a single data store into which data was ingested; which in turn meant a single vendor solution.
The burden of single vendor lock-in, and the inevitable compromises that approach brings with it are unconscionable in today’s agile, decentralised and increasingly data-heavy IT world. It is unlikely that any organisation can extract maximum performance, functionality and features from a single vendor solution that has been designed to appeal to the broadest possible customer base. The emergence of highly customised data warehouses in the cloud and data warehouse as a service (DWaS) is case in point, as organisations demand lower latency, higher-speed analytics, cost-effective processing and the ability to scale on demand.
Back in 2012 Gartner introduced the term “Logical Data Warehouse”, the idea being that you didn’t have to have a single data store, but instead could leverage best of breed data stores such as Hadoop or NoSQL technologies and present them as a single aggregated data source without the need to necessarily ingest the data first.
The idea has evolved over the last five years, but the fundamental premise remains: Organisations investing in data warehousing need to architect their solution based on a best of breed multi-vendor strategy. One that allows for good deal-making and competitive tendering among vendors vying for your custom, cost and time-effective incremental change and the most extensive level of compatibility between systems and processing platforms. Doing so allows the resource to grow and shift with the business, rather than become a fixed-point release that ultimately ages to become an impediment to progress.
Making the pieces fit
An effective data warehouse is the beating heart of your data strategy and consolidates or aggregates various raw data formats and multiple sources through a single presentation layer. This is why interoperability is so critical. This centralised data hub can then be used to correlate data and deliver a single version of the truth to all data consumers within an organisation whether they be BI analysts, data scientists, line of business users, analytics engines, visualisation systems, marketing communications platforms or even AI algorithms; either in near-real-time or on a periodic basis.
If you utilise best-of-breed standalone components, they need to talk to each other, as well as with the primary data store. With the emergence of the internet of things (IoT), data platforms are getting more fragmented as the sources of data grow in number. When building your data warehouse stack, whether you leverage cloud platforms such as Azure or AWS or visual analytics systems from the likes of Tableau, Qlik and MicroStrategy, the core of a best of breed data warehouse needs to be a database that can work across a wide variety of complementary applications, can straddle on-premise and cloud services and that does not insist on a single vendor investment or data format strategy. Ideally, a good logical data warehouse strategy will be complementary to the systems already operating in the business, maximising the return on investment in them, rather than displacing them.
These core component decisions will also need to take into account where the data is coming from, how much of it there is, how frequently it updates, how frequently it needs to be analysed and what else needs to be done with the data in order to extract value from it. From here you then have a base for adding on other commoditised and custom components and features that will bolster enterprise management, data visibility and value extraction.
Staying up-to-date without disrupting
The analytics space is evolving at breakneck speed. So much so that any integrated analytics solution is going to be out of date in less than the three-year lifecycle employed for most enterprise IT systems. Being able to extract individual solutions from a data warehouse will allow for efficient and cost-effective development and expansion of your data warehouse, while avoiding lock in to obsolete systems and code bases due to reliance on them by other parts of a single vendor system.
Good enough no longer cuts it
Ultimately, each solution needs to be proven and a market leader with regards to the functions you need. Good enough won’t cut it and will ultimately hold back the rest of the system. It’s beyond the pale for any one vendor to produce a fully-featured, completely flexible logical, cost effective data warehouse that will evolve the way every organisation needs. A decentralised, commoditised and component based data warehouse, built to the specific needs of the organisation will be best place to deliver better performance, easier customisation and gradual evolution that keeps pace with innovation across the board, rather than from a locked-in vendor.