Why LLMs fail on Financial Documents

Large language models (LLMs) are transforming many industries, but when it comes to financial documents, they fall short. In finance, accuracy defines trust. The fine print in regulatory filings, contracts, and reports drives valuations and due diligence outcomes. Yet these are exactly the materials most LLMs misread. Standard models handle clean PDFs, but struggle with charts, tables, and diagrams, breaking cell mappings, ignoring visuals, and missing links between data and context.

‍

The challenge increases with non-searchable PDFs like scanned filings or legacy portfolio reports. These confuse text-only models, resulting in missing content and fragmented insights. In due diligence, even a misplaced figure or unread chart can change a valuation narrative.

‍

Complex spreadsheet documents add further complexity. What matters isn’t just the output, but the logic - formulas, dependencies, and assumptions. Flattening a model into static text removes that intelligence, making it impossible to validate sensitivities or trace drivers.

‍

The result: incomplete analysis, slower deal cycles, and decreased confidence in AI-driven workflows. That’s why leading funds and family offices are turning to purpose-built document parsing, designed to handle unstructured, high-stakes materials with the precision real investment work requires.

‍

How Desia’s custom parsing solution gives you confidence in your numbers

‍

The Desia team has developed a comprehensive document parsing solution designed specifically for the challenges faced by finance professionals. Desia system combines Vision-Language Models (VLMs) with file-type specialization, ensuring every document, from VDRs and models to investor reports, is accurately understood, structured, and ready for analysis.

‍

VLM parsing output from Desia's pipeline (on the right), successfully extracting and interpreting charts, graphs, tables, and text from documents (on the left). The structured markdown output with contextual metadata provides high-quality input for downstream LLM processing and knowledge extraction.‍

How does it work? Inside Desia’s VLM-powered parsing pipeline

‍By combining VLM-powered extraction, contextual enrichment, and spreadsheet logic preservation, Desia delivers end-to-end parsing built for the real workflows of investment professionals. It accelerates due diligence, reduces manual review, and gives teams confidence in the integrity of AI-driven analysis.

Diagram representing the high-level Desia document parsing process, built as a multi-stage pipeline that balances generality with file-type specialization.

‍Smart file understanding

‍Desia automatically classifies and routes each file type (PDF, Word, Excel, image, and text) through its optimal processing path, ensuring every document is handled based on its actual internal structure and content, not just the file extension.

‍Conversion
‍

‍Documents are normalized into standardized intermediate representations. Office documents are converted into PDFs in isolated, resource-constrained environments to preserve layout and formatting.

‍

PDFs go through text extraction first, followed by page-to-image conversion.
Excel spreadsheets are handled through a dedicated parsing path that preserves formulas, data types, and sheet relationships.
Images are routed directly into the vision-language processing stage.
Plain text files are segmented into chunks based on content structure and token limits.

This approach provides uniformity for downstream processing while preserving the structural and semantic integrity of each file type.

‍

‍Parsing

‍Documents are processed through an orchestration layer that coordinates multiple vision-language models. Each page is represented both as an image and as text, enabling models to capture layout, diagrams, tables, and written content in parallel. Processing begins in batches for efficiency and automatically falls back to page-level retries under rate limits or when quality thresholds are not met. Retry strategies apply progressive backoff and alternative processing paths to maintain throughput and reliability.

‍Hybrid extraction strategy
‍
Extraction runs in parallel, combining visual and textual elements:

‍

Visual extraction renders pages as images, preserving layout, formatting, tables, charts, and diagrams.
Textual extraction provides the core content, preserves reading order, and exposes metadata.

Both representations are merged into a single input, allowing the model to cross-reference them. This enables richer outputs, including descriptive analysis of charts, diagrams, and other visual elements. This allows Desia to interpret not just written content but also the charts, tables, and figures that are central to financial analysis. Crucially, Desia parsing solution also handles scanned documents, where traditional AI often fails.

‍Spreadsheet specialization
‍
Desia is built to handle various spreadsheets with precision. The files are processed using a dedicated pathway that maintains their inherent structure. Instead of flattening spreadsheets into PDFs, the parser preserves:
‍

Cell coordinates and relationships
Formula logic and dependencies
Data types (dates, currencies, numerical precision)
Advanced features such as pivot tables, charts, and conditional formatting

By preserving the logic behind the numbers, through ensuring accurate interpretation of spreadsheet semantics, Desia makes AI-powered due diligence and portfolio value creation more reliable and effective.

‍Context enrichment

‍After parsing, every document passes through a context enrichment pipeline designed to generate document-level intelligence. This stage is a crucial step that ensures individual page outputs are interpreted in relation to the full document. The enrichment process uses a document-level cache to persist embeddings and analysis results across the entire file. The full document is first stored in cache, enabling global context to be referenced throughout subsequent steps. Pages are then processed in batches, where each page-level analysis incorporates both the page content and references to the global cached context. This design enables:

‍

Page summaries enriched with cross-document awareness
Identification of relationships between sections, tables, and figures across different pages
Detection of recurring themes or references that span multiple chapters or sections
Consistency in terminology and interpretation across the entire document

‍

‍Validation

‍Extracted outputs undergo multi-level quality assurance. Visual and text-based results are cross-validated for consistency. Duplicate detection operates at both page and context levels, filtering out repeated phrases and sentences. Documents must meet configurable quality thresholds for completeness, coherence, and structural integrity. Failures trigger retries with adjusted strategies until acceptable results are achieved. The result is a cohesive, markdown-format, document-aware output where each page is not only parsed locally but also contextualized against the larger structure and narrative of the document.

‍

Interested to learn more about Desia? Schedule a demo at desia.ai/try.

9 October 2025

Why LLMs fail on financial documents and how AI extraction can finally fix it

Leverage generative AI, enhance your workflow