Why the Anti-Fabrication System Keeps Failing

Four independent auditors ran the same prompt. Here is where they agree, where they split, and what I verified against the live system.

Auditors: Gemini 3.1 Pro Deep Research · Grok 4.3 · GPT-5.4 (xhigh) · Woz (Claude Opus 4.8, the instance that built it)  |  Date: 2026-06-06

Unanimous verdict

All four say the same thing: the architecture is wrong, not the regex.

Post-hoc pattern matching over free prose cannot tell a fabrication from a quote, a room count from a dollar figure, or a claim from an example. Every patch makes it more brittle. Delete the chat-blocking hook and the hand-edited fact file. Move the control to the only place it works: the outbound document, gated against actual source evidence.

The four auditors

Gemini
gemini 3.1 pro, deep research
Tiered: honesty prompt, Graphify grounding, LLM judge. The one auditor who votes to migrate to Hermes.
Grok
grok-4.3
Terse and absolute. Delete everything, honesty prompt, one outbound judge.
GPT-5.4
codex, xhigh reasoning
Most concrete. Two-lane workflow, per-document evidence ledger.
Woz
claude-opus-4.8
Built it. Verified the live data the others had to take on trust.

What I checked in the live system

Measurement artifact
Prompt claim #6: "1,015 of 1,028 recent hits are blocked, so the silent-in-chat gate fails."
Partly false. The blocked column defaults to true and the logger never sets it (hook lines 209-228). So 1,043 of 1,056 rows read blocked=true regardless of whether they blocked. The dashboard is lying about itself, the same disease it polices. All three external auditors repeated the 1,015 of 1,028 figure as fact; none caught that the column is never written. The real chat blocker is narrower: the always-on FACT REGISTRY check. source: my psql query + hook source, 2026-06-06
Dead config
The registry's top-priority rule is NEVER_CALCULATE_OCC_ADR_REVPAR.
Never enforced. The hook has zero references to hard_rules, so the single most important constraint is not read by a single line of code. Gemini 3.1 Deep Research caught this too by reading the source; I confirmed it by grep. source: grep of fabrication-check.py, 2026-06-06
Confirmed
The fact registry has silently self-contradicted on key counts.
True. The log shows HOUUS flagged against both 143 and 127 keys at different times. A hand-edited JSON file used as a safety wall drifts under the very agent it constrains. source: fabrication_log pattern labels, 2026-06-06
Confirmed
The streak counter reports a clean streak while the live log takes daily hits.
True. The guardrails table froze at 84 entries on 2026-05-31. The live log added 11 hits on 2026-06-06 alone. The metric and reality have diverged. source: lie_log_guardrails + fabrication_log, 2026-06-06

Where all four agree

QuestionGeminiGrokGPT-5.4Woz
Is post-hoc regex on prose the wrong layer?YesYesYesYes
Delete the chat-blocking Stop hook?YesYesYesYes
Delete the hand-edited fact-registry.json as a safety wall?YesYesYesYes
Kill the bespoke self-improvement loop (brain / lie-log / guardrails)?YesYesYesYes
Is the intel feed a hoard that needs a decide-by rule?YesYesYesYes
Keep behavioral honesty prompt as the chat-lane default?YesYesYesYes

Where they split (this is the decision)

QuestionGeminiGrokGPT-5.4Woz
How do you catch a fabricated number in an outbound doc? Three tiers: honesty prompt, Graphify grounding, then an LLM-judge final gate comparing the outbound PDF against the source OM One LLM judge on outbound only: "is every number sourced?" Per-document evidence ledger: each number binds to value + source + page/cell. Deterministic field-to-evidence check, LLM judge secondary Agree with GPT. Deterministic bind beats a judge; the judge is a backstop, not the gate
Migrate OpenClaw to Hermes? MIGRATE. A "highly favorable tradeoff", but concedes it only helps if you abandon the Claude Code loop entirely Stay. The problem is the brain, not the legs Not now. Freeze 30 days, then pilot on one low-risk job Stay. Fix the workflow first; migration is not what is on fire
Graphify knowledge graph as the fact store? KEEP. "the cornerstone of the new architecture" Evaluate as registry replacement NOT the fix. A code/doc graph is not a source of truth for live deal numbers in PDFs and spreadsheets Side with GPT. Deal numbers live in OMs and STR files, not a tree-sitter graph
LLM judge as the primary gate? Yes, as the final outbound gate Yes, primary on outbound Only for narrow "claim vs quote" calls, not the main gate No. Deterministic first, judge second
Single biggest blind spot? Identity crisis: a hotel firm running an AI research lab The source documents are not in context when numbers are requested A noisy detector is being promoted into policy: self-poisoning The agent polices itself with no independent referee

The sharpest line from each

Gemini 3.1 Deep Research

"The firm must accept that it is a consumer of AI technology, not a developer of AI orchestration frameworks."

Strongest on the identity crisis and on Graphify as the fact store. The lone vote to migrate to Hermes.

Grok 4.3

"The agent is routinely asked to produce financial numbers without the source documents in context. Fabrication is the predictable result."

The only auditor to name the upstream cause: no grounding data present at generation time. Detection treats the symptom.

GPT-5.4

"You are letting a noisy detector write policy. False positives are promoted into guardrails and streaks. The machine teaches itself from garbage."

Also caught that source-shaped syntax (backticks, "per Ace") satisfies the gate without real evidence, so the hook trains bypass behavior.

Woz (built it)

"Every control here is maintained by the same agent it is supposed to constrain. The fact file, the parser, the streak counter: no independent referee. That is why they drift and freeze silently."

The verified failures (default-true blocked column, unparsed hard_rules, self-contradicting registry) are all symptoms of self-refereeing.

The synthesis: what to actually build

Four auditors, one converged design. Grok found the cause, GPT found the cleanest cure, Gemini supplied the discipline, and the live checks confirm the failure modes. This is the least-bloat version of all four:

1
Turn off the hook as a chat blocker, today.Delete FACT REGISTRY live enforcement, the completion / identity / temporal regex, and the streak + guardrails shadow pipeline. Keep only an outbound-surface check. This stops the daily denial-of-service on normal work.
2
Two lanes.Chat lane: no blocking, just the honesty prompt (facts vs assumptions, say "I don't know", ask for the source). Publish lane (lender packet, external email): no free-typed numbers.
3
One evidence ledger per outbound document.Each critical number carries value, unit, source file, and page or cell. The draft is rendered from the ledger. A missing number stays "unknown", never a guess. A deterministic check confirms every number in the document maps to a ledger entry before send. replaces the regex hook and the JSON registry
4
Load the source before asking for the number.Grok's point. Most fabrications happen because the OM or STR file was not in context. Make the source document a precondition of the task, not an afterthought.
5
One honest metric, no autonomous self-improvement.Track confirmed outbound errors and human corrections, not hook hits. Rename the lie log: separate "detector alert" from "confirmed fabrication". Weekly manual review tweaks the prompt. No self-rewriting guardrails on contaminated signals.
6
Freeze the platform for 30 days.Stay on OpenClaw. Hermes does not fix Claude Code hooks unless you abandon the Claude Code loop entirely, which is a real migration with its own regressions. Gemini 3.1 dissents and favors a planned migration, but even it concedes the benefit only lands if you drop the Claude Code loop. Fix the workflow first, then pilot Hermes on one low-risk job only if the maintenance burden still hurts.

From the intel feed, the external independent reviewer survives every review and becomes the outbound check. Two items split the panel: Gemini 3.1 keeps Graphify as the new fact store while GPT and I say deal numbers do not live in a code graph; and GPT and I keep least-privilege tools as baseline safety while Gemini calls them bloat for this problem. Grill-me is bloat by unanimous vote. The hoard itself should become a decision register with a decide-by date.