Imagine trying to have a meaningful conversation about a library’s worth of technical documentation.
This is the reality many organizations face when implementing AI-powered document chat solutions.
While modern Large Language Models (LLMs) continue to expand their context windows, they still hit fundamental limits when dealing with truly massive document repositories — think millions of pages in energy, mining, or healthcare sectors.
Our initial approach at AgileDD addresses this challenge through intelligent metadata extraction.
By automatically capturing and indexing key information from documents, we enable more precise retrieval before a single question is asked.
But what if we could go further?
What if we could transform the document text itself to make it more digestible for AI systems?
This was the hypothesis that sparked a recent collaboration with four talented students from IFP School.
The concept was straightforward yet powerful: rather than feeding raw text into LLMs, what if we restructured the information to highlight the most valuable elements?
Think of it like the difference between reading a dense textbook and studying a carefully highlighted version with key concepts emphasized.
We all know which approach leads to better comprehension.
For AI systems, this highlighting comes in the form of structured data formats like XML or JSON.
These formats transform chaotic tables and unstructured content into clearly defined, machine-readable information that preserves both the data and its relationships.
To test this approach, we invited a group of four IFP School students — Aïcha, Stephania, Juan, and Samuel — to work with us on a three-week project.
Their goal was ambitious: determine whether enriched text could meaningfully improve the accuracy of AI chat responses when dealing with complex technical documentation.
The students leveraged AgileDD’s API library to extract text from mining assessment reports and convert tables into structured formats.
These reports, provided by Quebec’s MNRF, served as perfect test cases — containing both narrative descriptions and data-heavy tables with critical information about mineral deposits.
“What surprised me most was how quickly we were able to implement and test our ideas using the AgileDD APIs,”said Aïcha. “Within days, we had a functioning benchmark that allowed us to measure the impact of text enrichment on response accuracy.”
Juan added, “The biggest challenge was determining the right approach to table extraction. Mining reports contain complex tables with critical data that needs to be preserved in a structured way. Finding the right balance between preserving raw data and adding context was crucial.”
The results exceeded expectations. Initial tests showed a remarkable improvement in accuracy — from 70% with standard RAG approaches to 80% with enriched text structures. The improvement was particularly noticeable when queries involved numerical data originally contained in tables.
“The difference was dramatic for queries about specific mineral concentrations or geographic coordinates,”explained Stephania. “When this information was properly structured in a machine-readable format, the AI could access it much more reliably than when it was buried in a standard text extraction.”
Samuel observed that “the enriched text approach essentially gave the AI a structured knowledge base rather than just text passages. This makes a fundamental difference in how it processes and responds to technical queries.”
While our initial testing focused on mining documentation, the implications extend across all industries with complex technical documentation.
The energy sector, with its wealth of well logs and geological reports, faces similar challenges.
Healthcare organizations dealing with structured medical data embedded in narrative clinical notes could benefit from the same approach.
The key insight is that technical documents often contain a mix of narrative text and structured data. Traditional text extraction treats everything as prose, losing the inherent structure of tables, charts, and specialized formats.
By preserving and enhancing this structure, we enable more accurate AI interactions.
The experiment with IFP School students provided compelling evidence for our enriched text approach.
An initial 10% improvement in accuracy represents thousands of correctly answered queries in enterprise settings — the difference between a helpful system and one that becomes mission-critical.
The students also proposed guidelines for further improvements, suggesting that accuracy could be pushed even higher with additional refinements to the text enrichment process.
As LLMs continue to evolve, our focus remains on finding practical ways to maximize their effectiveness with real-world documentation.
The enriched text approach represents a significant step forward — one that builds on our human-guided philosophy by structuring information in ways that bridge the gap between human and machine understanding.
We extend our sincere thanks to Aïcha, Stephania, Juan, and Samuel for their contributions to this research. Their fresh perspectives and technical skills helped validate a concept that will benefit AgileDD users across multiple industries.
The next time you chat with your organization’s documentation, remember that the quality of the conversation depends not just on the AI model, but on how the information is prepared.
Better AI interactions require both powerful models and intelligently structured data.
When we focus on preparing information thoughtfully, we unlock the full potential of AI for document intelligence.