Data engineering - Pryon: Turning chaos into clarity
This is a guest post for the Computer Weekly Developer Network written by Josh Goldenberg in his role as VP of product at Pryon.
Pryon is known for its technology that enables government agencies to introduce secure generative AI solutions that deliver accurate and timely answers from trusted content.
The company says that rapid access to authoritative content jumpstarts stalled workflows, cuts down on repetitive tasks and democratises access to knowledge.
Goldenberg writes in full as follows…
Generative AI has been making headlines for it’s potential to revolutionise the way we think, work and solve problems, with McKinsey projecting it will contribute up to $4.4 trillion dollars to the global economy annually.
However, many of the organisations rallying to seize this opportunity will be unsuccessful, with some analyst groups (Gartner) predicting a 50% failure rate for gen-AI deployments in the enterprise. The gulf between the expectations of massive competitive advantage and the reality of expensive & ineffective pilots failing can be bridged by effective data engineering.
Harvard Business Review found that 48% of respondents indicated data readiness (data issues related to siloed, duplicated, untrustworthy, or poor data) as their top obstacle to AI adoption and 91% of respondents said that to successfully deploy AI they must have a reliable data foundation.
Hello, data engineering
Data Engineering is the discipline that takes raw, unstructured data and transforms it into actionable, high-value insights. Without a strong data foundation, the $10M average that 1 in 3 enterprises are spending on AI projects next year alone, are setting themselves up for failure.
As data creation accelerates – 90% of the world’s data has been generated in the last two years – engineers are tasked with more than just managing it. They have to structure, organise and operationalise data so it can actually be useful and produce the right outputs. From building reliable pipelines to ensuring data quality, engineering teams play the central role in making systems that actually solve problems. The art and science of data engineering is about understanding what data you have, where it lives, where it’s going and how to turn it into an actionable strategic asset.
What’s in your backyard?
Step one is discovery – taking stock of what data you actually have. This sounds obvious, but in sprawling IT ecosystems, data sprawls too. It lives in silos, on servers, in the cloud and sometimes in places no one thought to look. Structured data, which has occupied most of the enterprise IT mindshare for the past decades, has warehouses, lakes, lake- houses & all manners of established best practices to sort, surface, clean and move [traditionally ETL].
Unstructured data, comprising 80-90% of many organisations data and being the source for most generative AI use cases today, is lost in a wilderness of SaaS applications with highly variable data formats, metadata, non-digitally native pdfs, poorly versioned SharePoint files, data stores and all manners of messy book-keeping across an organisation.
Starting a gen-AI project without knowing where your data is or what it represents, it’s like trying to write a book without knowing the alphabet. Time invested in understanding the topology of where your data lives, the ontology of what it looks like and clearly defined end-user applications to become ‘AI Ready’, will pay dividends.
Proper retrieval and ingestion methods can surface both the obvious assets – customer records, sales reports, operational metrics – and the hidden gems buried in log files, archived emails and overlooked repositories.
These engines are critical for ensuring no valuable resource remains untapped.
Once you know what you have, the next challenge is organising it in a way that makes sense for the range of end-user applications you intend to support.
That’s where ontologies come in.
Ontologies: oh, you got an ology!
At its core, an ontology is a set of concepts and categories in a subject area or domain that shows their properties and the relations between them; typically a structured, repeatable, consistent way of describing what your data means and how it’s connected. Think of it as building a map (commonly referred to a knowledge graph) that shows what your data inventory is and what it represents in relation to other data assets.
Ingestion and pre-processing capabilities are key to creating this mapping index automatically, parsing noisy, unstructured data into semi-structured & structured and searchable formats.
This process turns a chaotic data ecosystem into something that can be queried with precision. It’s not just about storage; it’s about finding the data they need, exactly when they need it. As we move from a conversational paradigm where data is delivered to a human who can review & interpret the output, to an agentic paradigm that will need to use these retrieved results as an input for an automated workflow, the effectiveness of your data preparation & accuracy of your retrieval are non-negotiable.
Make sense of it
Discovery and mapping give you the pieces, but synthesis is where you start assembling the puzzle. It’s the process of interrogating the data to ask meaningful questions and find useful patterns. What story does the data tell? How does it align – or conflict – with your business objectives?
We can use retrieval + generative technology; grounded on our ontologies and known prior knowledge, to assist in this interrogation. We can begin to identify gaps in our knowledge, areas of contradiction, or create focus and reduce unnecessary duplication. This technology can help synthesise information into insights you can use, making sense of your data, connecting dots and highlighting patterns that would be impossible for humans to identify alone.
They transform raw data into strategic intelligence, structured for both human legibility and actionable agentic machine flows
Turning insights into impact
Data synthesis is interesting, but taking action is paramount. The final step is putting it to work. Whether that means automating workflows, making real-time decisions, or delivering predictive insights, this is where the rubber meets the road.
Agentic orchestration can enable systems to take the synthesised insights and act on them autonomously or with minimal human input. These engines bridge the gap between theory and practice, ensuring that your data doesn’t just sit idle – it drives measurable outcomes.
Don’t put the cart…
Everyone is enamoured with generative AI and state-of-the-art model releases, often overlooking that it’s the data foundation that will make or break your use case (& the relative investment you’ve made). True data engineering for gen-AI workloads goes beyond parsers & loaders, & should make use of robust & flexible ingestion, effective retrieval, generative insights, clear & observable orchestration and have articulated intent on how you are going to take action with your data as your foundational strategic asset.
An end-to-end tunned and optimised Agentic platform is the fastest and most reliable way to achieve these goals across a myriad of use cases today.