The refusal contract
The most dangerous thing a legal or financial assistant can do is answer confidently when it shouldn’t. A plausible-sounding but unsupported answer is worse than no answer — it gets relied upon.
KAOS makes refusal a first-class, typed outcome, not an afterthought.
flowchart TD
q["Question + retrieved corpus"] --> claims["Candidate claims<br/><small>each with a span</small>"]
claims --> v{"Enough claims<br/>verify against source?"}
v -->|yes| ans["✅ Answer<br/><small>with cited claims</small>"]
v -->|no| ref["🛑 Refusal<br/><small>EvidenceInsufficient (typed)</small>"]
classDef good fill:#f0fdf4,stroke:#22c55e,color:#14532d;
classDef stop fill:#fef2f2,stroke:#ef4444,color:#7f1d1d;
class ans good;
class ref stop;
Refuse rather than hallucinate
Section titled “Refuse rather than hallucinate”When a research agent can’t find adequate support for an answer in the retrieved corpus,
it doesn’t paper over the gap with a guess. It emits a typed signal —
EvidenceInsufficient / a grounded-refusal event — that downstream code can branch on.
“I don’t have enough to answer that” is a valid, expected, machine-readable result.
This pairs directly with grounded citations: every claim must carry a span that verifies against the source. If the support isn’t there, the claim is rejected — and if too many claims are rejected, the answer becomes a refusal rather than a fabrication.
Why typed, not just a string
Section titled “Why typed, not just a string”Because refusal is a typed outcome:
- Callers can act on it. A pipeline can route an insufficient-evidence result to a human, widen the search, or ask a clarifying question — deterministically.
- It can’t be mistaken for an answer. A free-text “I’m not sure…” can slip through a consumer that just renders the text. A typed refusal can’t.
- It’s measurable. You can track refusal rate as a quality signal — too high means retrieval is failing; too low might mean the agent is over-claiming.
The principle
Section titled “The principle”This is the same stance as why plain BM25 and cost as a contract: prefer the honest, accountable default. An agent that knows when to say “not enough evidence” is more trustworthy than one that always has an answer.