Desia

We recently benchmarked Desia on the vals.ai Finance Agent Benchmark, a test built around the kinds of tasks financial analysts actually do: SEC filings, GAAP reconciliation, financial modelling, and cross-company analysis.

‍

With GPT-5.4 indexed at 100, leading frontier models cluster within roughly 10 points of the baseline. Desia scored 133.

‍

Frontier models are already exceptional. The difference is not raw intelligence. The difference is the system around the model.

‍

Finance is not a single-prompt problem

‍

A lot of AI evaluation still assumes the model is the product: ask a question, get an answer, measure whether it is right. That is not how financial work happens.

‍

A real analyst task is rarely one question in one document. It is a workflow. It means finding the right filing, pulling the right section, checking the relevant period, comparing definitions across companies, reconciling reported and adjusted figures, running calculations, and making sure the final answer actually holds up.

‍

In finance, failure rarely comes from one dramatic mistake. It comes from smaller ones that compound: pulling the wrong period, mixing GAAP and non-GAAP figures, missing a footnote, comparing metrics defined differently across companies, or repeating management commentary without checking whether the numbers support it.

‍

Those are not just reasoning failures. They are workflow failures. This is where single-model AI starts to break down.

‍

What changes when you build the system properly

‍

The best financial AI is not just a smarter model. It is a better system.

‍

That means retrieving the right documents in the right order. It means using structured tools and financial data sources, not relying on generic search alone. It means breaking complex tasks into subtasks, routing them to the best-fit model or tool, and checking outputs before they are returned. Desia is built around that workflow reality.

‍

Multiple specialized agents work in parallel on the same task. Each has access to the tools, context, and data sources best suited to that part of the workflow. Outputs are cross-checked before they are assembled into a final answer. The goal is not to generate something plausible. The goal is to produce work an analyst can actually rely on. None of this sounds magical in isolation. But together, it changes performance materially.

‍

Accuracy matters. Speed matters too.

‍

There is a second dimension that matters just as much in production: speed.

‍

An answer that takes too long is not useful, no matter how accurate it is. If a system does not fit inside the pace of real analyst work, it does not get adopted. In our benchmark run, Desia combined strong accuracy with a median response time of around two minutes.

‍

The bar is not “can the model solve the task eventually?” The bar is “can the system solve it quickly enough, reliably enough, and with enough trust built in that a team will actually use it?”

‍

What this means

‍

As base models continue to improve, it becomes even clearer that the durable advantage is not the model alone. Two systems built on the same frontier model can perform very differently in practice. The difference is everything around it, the parts that are invisible in a demo but decisive in production.

‍

Benchmarks help cut through that noise. But the real test is always your own workflow: the tasks your team runs every week, under real time pressure, with real documents and real standards of accuracy.

‍

If you are evaluating AI for research, diligence, or portfolio workflows, we would be happy to show how Desia performs on one of your team’s real workflows. Get in touch.

‍

AI Search unlocks the Full Potential in your Data

Our large-scale natural language processing allows you to search, generate documents,and uncover answers to complex questions in minutes.

Ask →

In Q3 2023, Company XYZ faced a major data breach, exposing customer information. The incident led to a 30% churn rate among enterprise clients over the next two months. Consequently, annual recurring revenue dropped from $30M to $23M by year-end 2023.

The System Around the Model

Finance is not a single-prompt problem

What changes when you build the system properly

Accuracy matters. Speed matters too.

What this means

AI Search unlocks the Full Potential in your Data

Leverage generative AI, enhance your workflow