The Enterprise Data Fabric: an information architecture for our times

This is a guest blogpost by Sean Martin, CTO and co-founder, Cambridge Semantics

The post-big data landscape has been shaped by two emergent, intrinsically related forces: the predominance of cognitive computing and the unveiling of the data fabric architecture. The latter is an overlay atop the assortment of existing distributed computing technologies, tools and approaches that enable them to interact for singular use cases across the enterprise.

Gartner describes the data fabric architecture as the means of supporting “frictionless access and sharing of data in a distributed network environment.” These decentralized data assets (and respective management systems) are joined by the data fabric architecture.

Although this architecture involves any number of competing vendors, graph technology and semantic standards play a pivotal role in its implementation. By providing business meaning to data and flexibly integrating data sources of any structural type, respectively, this tandem delivers rapid data discovery and integration across distributed computing resources.

It’s the means of understanding and assembling heterogeneous data across the fabric to make this architecture work.

Drivers

The primary driver underpinning the necessity of the data fabric architecture is the thresholds of traditional data management options. Hadoop inspired data lakes can co-locate disparate data successfully, but encounter difficulty actually finding and integrating datasets. The more data that disappears in them, the more difficult organizations have governing them or achieving value.  These options can sometimes excel at cheaply processing vast, simple datasets, but have limited utility when operating over complex multiple entity laden data which restricts them to only the simplest integrations.

Data warehouses can offer excellent integration performance for structured data, but were designed in the slower pace of the pre big data era. They’re too inflexible and difficult to change in the face of the sophisticated and ever increasing demands of today’s data integrations, and are poorly suited for tying together the unstructured (textual and visual) data inundating the enterprises today. Cognitive computing applications like machine learning require far more data and many more intricate transformations, necessitating modern integration methods.

Semantic Graphs

The foremost benefit semantic graphs endow data fabric architecture with is seamless data integrations. This approach not only blends together various datasets, data types and structures, but also the outputs of entirely distinct toolsets and their supporting technologies. By placing a semantic graph integration layer atop this architecture, organizations can readily rectify the most fundamental differences at the data and tool levels of these underlying data technologies. Whether organizations choose to use different options for data virtualization, storage tiering, ETL, data quality and more, semantic graph technology can readily integrate this data for any use.

The data blending and data discovery advantages of semantic graphs are attributed to their ability to define, standardize, and harmonize the meaning of all incoming data. Moreover, they do so in terms that are comprehensible to business end users, spurring an innate understanding of relationships between data elements. The result is a rich contextualized understanding of data’s interrelations for informed data discovery, culminating in timely data integrations for cutting edge applications or analytics like machine learning.

With the Times

Although the data fabric architecture includes a multiplicity of approaches and technologies, that of semantic graphs can integrate them—and their data—for nuanced data discovery and timely data blending. This approach is adaptable for modern data management demands and empowers data fabric architecture as the most suitable choice for today’s decentralized computing realities.

The knowledge graph by-product of these integrations is quickly spun up in containers and deployed in any cloud or hybrid cloud setting that enhances germane factors such as compute functionality, regulatory compliance, or pricing. With modern pay on demand cloud delivery mechanisms in which APIs and Kubernetes software enable users to automatically position their compute where needed, the data fabric architectures is becoming the most financially feasible choice for the distributed demands of the modern data ecosystem.