Catching LLM Hallucinations with Uncertainty

LLMs fail without warning. A separate uncertainty estimate lets a model abstain, escalate, or defer instead of confidently making things up.

LLMs hallucinate — they produce fluent, confident, and wrong answers, with no outward sign that anything is off. That’s fine in a brainstorming tool and dangerous in a system that answers customer questions, reviews documents, or decides an agent’s next action. The hard part isn’t that models are sometimes wrong; it’s that they’re wrong without warning.

The usual signal — the softmax probability — doesn’t help much, because a model can be supremely confident and still completely wrong, especially on inputs unlike anything it saw in training. What you actually want is a separate estimate of the model’s uncertainty: a signal that says “this one is on shaky ground,” independent of how fluent the answer sounds.

That’s the idea behind uncertainty-aware language models. In our work on selective question answering (published as arXiv:2311.15451), we converted standard models — BERT on SQuAD, Llama 2 on TruthfulQA — into variants that output an uncertainty estimate alongside every answer. With that estimate, the model can choose when to answer: respond when it’s confident, and abstain or defer when it isn’t — answering a large share of questions accurately while quietly stepping back from the ones it would have gotten wrong.

That “answer-or-abstain” decision is a risk policy, the same pattern that makes any AI system safer to deploy:

Abstain — when uncertainty is high, don’t emit the answer: return “I don’t know,” ask a clarifying question, or fall back to a safe default.
Escalate — route the uncertain case to a stronger model, a retrieval step, or a human reviewer.
Adapt — log the uncertain inputs; they’re exactly the examples worth labeling for the next round of training.

This generalizes well beyond Q&A. Any LLM or agent step that makes a discrete decision — a classification, a tool call, a yes/no — can be gated on uncertainty, so the system automates what it understands and surfaces what it doesn’t. It’s the same shift we keep coming back to: not a model that’s never wrong, but one that knows when it might be.

Capsa adds this uncertainty layer to existing models. Read the docs, or request access to try it on your own models.

For the full method and results, see Uncertainty-Aware Language Modeling for Selective Question Answering.