Data engineering - Eve: Making a case for transformed unstructured data with LLM-power
This is a guest post for the Computer Weekly Developer Network written by legal AI agent software company Eve co-founder and chief product officer Matt Noe.
Eve is the first personalised AI tool built for the legal profession that you users can partner with, train and teach, just like any another member of the team.
The company’s mission is to empower plaintiff lawyers to achieve justice for their clients by streamlining manual, time-intensive work with AI technology.
Noe writes in full as follows…
Organisations of all types grapple with an overwhelming influx of unstructured data flowing through disparate systems – from legal teams processing thousands of case documents and client communications to healthcare providers managing patient records and insurance claims.
Teams still often rely on employees to manually gather, sift through and process this information, inputting it into their systems of record to initiate critical workflows.
This manual data processing challenge affects industries across the spectrum, creating significant operational bottlenecks.
Traditional ETL pipelines and data engineering approaches fall short when confronted with the semantic complexity and format diversity of real-world information flows.
However, the emergence of Large Language Models (LLMs) is revolutionising how we approach this challenge, enabling automated understanding and transformation of unstructured data into structured representations that power both human workflows and AI systems.
The data engineering challenge
The scale and complexity of modern data landscapes present unprecedented challenges for organisations.
Each and every day, businesses process thousands of documents across dozens of formats – from PDFs and spreadsheets to images and audio recordings.
This information flows through multiple communication channels and must be reconciled across various systems of record, each with its own data model and requirements.
Traditional data engineering approaches struggle with this complexity. While conventional ETL pipelines excel at processing structured data, they falter when confronting the ambiguity and variability of real-world information.
Rule-based systems become brittle and expensive to maintain as the variety of data sources grows. Even modern integration platforms, designed for API-driven workflows, struggle with the semantic understanding required to process natural language content effectively.
LLM-power
Large Language Models offer a fundamentally different approach to data engineering.
Rather than relying on deterministic transformation rules, LLMs can understand context and extract meaning from unstructured content, effectively turning any document into a queryable data source.
This capability enables a new architecture for data processing systems. At the foundation lies an intelligent ingestion layer that can handle diverse input sources. Unlike traditional ETL systems, this layer doesn’t just extract text – it comprehends it.
Documents, emails and conversations are processed through LLMs that understand their context and purpose, extracting not just explicit fields but implicit relationships and intentions. The transformation pipeline converts this understanding into structured representations.
Instead of forcing data into predefined schemas, the system can dynamically identify entities, relationships and attributes.
For example, when processing thousands of pages of medical records for a personal injury claim, these systems can automatically construct structured timelines of treatments, highlight accident-related complications and risk factors and connect related diagnoses across multiple healthcare providers – tasks that would typically require hours of manual review by legal professionals and medical experts.
This flexibility allows organizations to capture the full richness of their information while maintaining consistency and queryability, whether they’re analyzing insurance documentation, witness statements, or client communications.
Powering human & AI workflows
The true power of LLM-based data engineering emerges in its ability to serve both human users and AI systems.
For human workflows, the structured representations provide immediate access to relevant information, reducing the cognitive load of searching through documents and connecting related pieces of information. This enables faster, more informed decision-making and reduces the time spent on manual data entry and verification.
For AI systems, particularly LLM-powered agents and automation workflows, these structured representations provide high-quality, consistent input that improves their performance. When an AI agent needs to answer questions or take actions, it can work with pre-processed, structured data rather than having to process raw documents from scratch. This not only improves accuracy but also reduces computational overhead and costs.
This dual-purpose nature creates a compound effect: as data becomes more structured and accessible, both human and AI capabilities expand.
Organisations can build increasingly sophisticated workflows that combine human expertise with AI automation, each building on the foundation of well-structured, semantically meaningful data.
The future of data engineering lies in this synthesis of human and machine intelligence, powered by LLMs that can bridge the gap between unstructured information and structured data. As these systems evolve, they’ll continue to unlock new possibilities for automation, insight and innovation across industries.