Every framework I evaluated had the same blind spot: they trusted the agent to report what it did.
"Here's what I executed," the agent would say. "Here's what happened. Here's what I'm doing next."
And most systems just... believed it. They built audit trails from self-reported logs. They made governance decisions based on what the agent claimed to have done. They committed to quality gates that depended entirely on the agent being honest about its own performance.
That's not observation. That's taking testimony from the defendant and calling it due process.
When I started building VNX—the governance layer behind our multi-agent AI orchestration—I realized that the moment my system became complex enough to need governance, I could no longer afford to trust an agent's self-report. Not because agents are dishonest, but because they're fundamentally limited in what they can see about their own behavior.
An AI agent executing code doesn't know if the code changed what it intended to change. It doesn't know if a subtle bug is cascading through downstream systems. It doesn't know if it made a decision based on a hallucination that won't surface for weeks.
It can report what it tried to do. But that's not the same as what it actually did.
The External Watcher Pattern is how I solved this.
The Broken Model: Self-Reporting Observability
Let me be concrete about what fails with agent self-reporting.
In a typical multi-agent system, Agent A completes a task and says: "Done. I created three files. I ran the tests. All passed. Here's my report."
Agent B trusts that report and builds on it: "Got it. You created three files. I'll integrate them now."
Then the integration breaks. Why? Because Agent A didn't know that one of the three files had a syntax error that only showed up when imported a certain way. Agent A ran tests on its own modules. It never tested the integration point.
Agent A's self-report was honest but incomplete. The system failed anyway.
This happens constantly in AI orchestration:
- An agent runs a test suite and reports success, but a performance regression isn't caught by the test harness
- An agent claims it validated input, but downstream the input causes an edge case failure
- An agent reports a dependency update as safe, but it created a subtle version mismatch
- An agent submits code for review, and in its report says "validated against requirements," but the actual files in git don't match what it claimed to validate
The pattern is always the same: the agent reports based on what it knows about, not what it actually affected.
The Dual-Input Bridge: How External Watching Works
The External Watcher Pattern breaks this cycle by observing agents through two independent channels:
- Input Channel 1: Agent Hooks (when available) — Structured callbacks from the agent execution environment that emit real-time signals about what the agent is doing
- Input Channel 2: Filesystem Watching (always works) — A neutral observer that watches what actually changed in the system, regardless of what the agent claims
These two channels feed into a single unified observation stream. When they align, you have confidence. When they diverge, you've found a critical gap.

Here's the actual architecture from VNX:
Agent Execution → Hook Events (if available)
↓
Receipt Processor V4
↓
Dual-Input Validator Bridge
↙ ↘
Hook Reports Filesystem Truth
(What agent (What actually
claims) happened)
↖ ↙
Unified Receipt (conflict-resolved)
↓
VNX Governance PipelineThe Receipt Processor V4 is the engine. It monitors two things simultaneously:
- Hook channel: Real-time JSON events from the agent's execution environment (if it supports hooks)
- Filesystem channel: File modifications, creations, deletions captured by a neutral watcher
Then it runs a simple but powerful conflict-detection algorithm: "Did the agent report it created a file? Does that file actually exist? Are the contents what the agent said they'd be?"
If yes to all three, the receipt passes. If any misalignment exists, the receipt is flagged and escalated to the next quality gate.
Real Implementation: receipt_processor_v4.sh
Let me show you the actual code that powers this in VNX.
The Receipt Processor runs continuously and monitors the unified reports directory:
#!/bin/bash
# receipt_processor_v4.sh
# Watches .vnx-data/unified_reports/*.md and generates receipts
REPORTS_DIR=".vnx-data/unified_reports"
RECEIPTS_DIR=".vnx-data/receipts"
HOOK_EVENTS=".vnx-data/hook_events"
while true; do
for report in "$REPORTS_DIR"/*.md; do
if [[ ! -f "$RECEIPTS_DIR/$(basename "$report" .md).json" ]]; then
# Parse the markdown report
python3 report_parser.py "$report" > /tmp/parsed.json
# If hook events exist for this task, compare them
if [[ -f "$HOOK_EVENTS/$(basename "$report" .md).json" ]]; then
HOOK_DATA=$(cat "$HOOK_EVENTS/$(basename "$report" .md).json")
# Run dual-input validation
jq -n \
--slurpfile report /tmp/parsed.json \
--argjson hooks "$HOOK_DATA" \
'{
timestamp: now | todate,
task_id: $report[0].task_id,
reported_changes: $report[0].files_modified,
hook_signals: $hooks.events,
validation: (
if ($report[0].files_modified | length) ==
($hooks.events | map(select(.type == "file_change")) | length)
then "ALIGNED"
else "DIVERGENT"
end
),
status: "PROCESSED"
}' > "$RECEIPTS_DIR/$(basename "$report" .md).json"
else
# No hooks available, trust filesystem observation
jq -n \
--slurpfile report /tmp/parsed.json \
'{
timestamp: now | todate,
task_id: $report[0].task_id,
validation_method: "FILESYSTEM_ONLY",
status: "PROCESSED"
}' > "$RECEIPTS_DIR/$(basename "$report" .md).json"
fi
fi
done
sleep 5
doneThe key insight: this doesn't need hook events to work. It degrades gracefully. If the agent environment doesn't provide hooks, the watcher relies entirely on filesystem truth. But when hooks are available, they provide that second channel of validation.
The report_parser.py script extracts the actual agent claims from markdown:
#!/usr/bin/env python3
# report_parser.py - Extract structured data from markdown reports
import sys
import re
import json
def parse_report(markdown_content):
"""Extract task claims, file changes, validation steps"""
report = {
"task_id": None,
"files_modified": [],
"tests_run": [],
"validation_steps": [],
"claimed_status": None
}
# Find task ID
match = re.search(r'Task ID: (T[0-3]-\d+)', markdown_content)
if match:
report["task_id"] = match.group(1)
# Find all "modified X" claims
for match in re.finditer(r'Modified: `([^`]+)`', markdown_content):
report["files_modified"].append(match.group(1))
# Find test execution claims
for match in re.finditer(r'Test: ([^\n]+) → (PASS|FAIL)', markdown_content):
report["tests_run"].append({
"test": match.group(1),
"result": match.group(2)
})
# Extract status claim
match = re.search(r'Status: (COMPLETE|FAILED|ESCALATED)', markdown_content)
if match:
report["claimed_status"] = match.group(1)
return report
if __name__ == "__main__":
with open(sys.argv[1], 'r') as f:
markdown = f.read()
parsed = parse_report(markdown)
print(json.dumps([parsed]))This is deliberately simple. It's not trying to be intelligent. It's extracting claims from the agent's own words, then comparing those claims against what actually happened on the filesystem.
Multi-Provider Dispatch: Provider-Neutral Observation
One of the design requirements for VNX was supporting multiple AI providers without special cases. A task might be dispatched to:
- Claude Code via
/skill-name - Claude via direct API
- Codex CLI with
$skill-name - Gemini CLI with
@skill-name
The External Watcher doesn't care. It doesn't need to integrate with each provider's specific logging. It just watches the filesystem.

When a dispatch is created, VNX records:
{
"dispatch_id": "D-2026-0305-001",
"created_at": "2026-03-05T10:30:00Z",
"assigned_to": "T1",
"provider": "claude_code",
"skill": "refactor_component",
"baseline_files_hash": "a3f2e91...",
"filesystem_snapshot": {
"src/components/": [list of files and hashes],
"src/lib/": [list of files and hashes]
}
}The watcher then monitors those exact files. When the provider's agent completes execution, it generates a unified report (markdown). The Receipt Processor reads that report, extracts claims, and compares:
- Files the agent claims to have modified → Files that actually changed in the filesystem
- Tests the agent claims to have run → Existence of test artifacts, test logs
- Dependencies the agent says it verified → Lock files, package versions
- Performance targets the agent says it met → Benchmark logs (if they exist)
The provider-neutral part is critical: I don't need to parse Claude's thinking blocks, or Codex's execution logs, or Gemini's intermediate outputs. Those are implementation details. What matters is the filesystem truth—and that's universal across all providers.
What the Watcher Catches That Self-Reporting Misses
Let me give you concrete examples from running VNX in production.
Example 1: The Silent Dependency Regression
Agent claim from unified report: "Updated next.config.mjs and validated against package.json. All dependencies aligned."
Filesystem watcher finds: Next.js updated from 15.0.3 to 15.1.0, but the lockfile wasn't regenerated. The claim was honest—the agent did validate the config. But it didn't know that package managers were configured to allow minor version bumps, and the dev environment would install 15.1.0 while CI would install 15.0.3.
The dual-input bridge flagged this: "Hook events show dependency check passed, but filesystem shows version mismatch in lockfile timestamp vs package.json modification time." This goes to the quality gate, not automatically approved.
Example 2: The Phantom Test Pass
Agent claim: "Created test suite. All 12 tests passed."
Filesystem watcher found: Test file was created. Test artifact file existed. But the test artifact was from yesterday's run, not today's. The agent ran the test command, saw an exit code 0 from a cached result, reported success, and moved on.
The watcher compared the test artifact's mtime with the task start time. Divergence detected. Escalated.
Example 3: The Partial File Modification
Agent claim: "Modified src/components/Header.tsx with the changes requested."
Filesystem watcher found: File was modified (✓). But 80% of the file remained identical to the version before. The agent made surgical changes correctly, but the claim was "modified the component"—which humans interpret as "rewrote it." The neutral watcher simply reported: "Header.tsx changed by 4.2% of line count."
The next agent reviewing this code saw the honest statistic and knew exactly how much of the file was actually changed.
These are the gaps where self-reporting is either blind or prone to interpretation mismatch. The External Watcher doesn't judge these situations—it just makes them visible.
Integration with the Broader VNX Pipeline
The Receipt Processor's output feeds directly into the quality gates. The VNX Supervisor monitors all processes:
# vnx_supervisor_simple.sh watches all receipts and escalates divergences
while true; do
for receipt in .vnx-data/receipts/*.json; do
if jq -e '.validation == "DIVERGENT"' "$receipt" > /dev/null; then
# Divergence detected - escalate to quality gate
TASK_ID=$(jq -r '.task_id' "$receipt")
echo "ESCALATION: $TASK_ID - Hook events and filesystem diverged" \
>> .vnx-data/quality-gates/escalations.log
# Block automatic approval
echo "BLOCKED" > ".vnx-data/tasks/$TASK_ID/approval_status"
fi
done
sleep 5
doneThe T0 Orchestrator reviews these escalations. It doesn't just see "approval blocked." It sees the actual divergence data, can pull the agent's full report, can inspect the filesystem diff, and makes an informed decision.
Honest Limitations
This approach isn't perfect, and I want to be clear about what it can't do.
It can't detect logical errors. If an agent modifies a file and the file syntax is correct but the logic is wrong, the watcher will report "file modified successfully." The logical error requires code review or testing at the next gate.
It depends on filesystem coherence. If the underlying storage is flaky or if operations happen too quickly (race conditions), the watcher might miss changes. In practice, this is rare, but it's a theoretical limitation.
It requires stable task isolation. If Agent A and Agent B are modifying the same files simultaneously, the watcher can't reliably attribute changes. This is why VNX enforces task-level file locking.
Hook events add latency. If available, they're useful, but they're not free. They add network I/O and processing overhead. The system gracefully degrades to filesystem-only if hooks become too slow.
False positives are possible but rare. A file might be modified outside the agent's execution (manual edit, concurrent process). The watcher will flag this, and the quality gate team investigates. This is a feature, not a bug—it makes you aware of assumptions your system was making.
The Philosophical Core
The External Watcher Pattern is built on a simple principle: external observation beats self-reporting at scale.
When you have one AI agent handling a simple task, self-reporting works fine. When you have four parallel agents, each spawning sub-agents, each modifying shared infrastructure, each making claims about what they did—self-reporting becomes a liability.
The watcher doesn't care if the agent is trustworthy. It cares about ground truth. It watches the filesystem, compares it against claims, and makes divergences visible.
In the next part of this series, I'll cover how these divergences feed into the async quality gates—the decision points where humans and AI work together to determine whether an agent's work gets approved or escalated.
For now, the key takeaway is this: if you're building multi-agent AI systems, you need to observe them from the outside. Don't trust the agent to audit itself. Build a watcher.
The full VNX orchestration system — including the external watcher, receipt processor, and dual-input bridge — is open source on GitHub.
📖 Read also: From Human-in-the-Loop to Human-on-the-Loop — How the external watcher enables graduated agent autonomy
📖 Read also: Async Quality Gates: Why AI Agents Don't Get to Decide When They're Done — The decision points where humans and AI work together to determine whether an agent's work gets approved
This is Part 6 of the Glass Box Governance series.
📚 Glass Box Governance series
- One Terminal to Rule Them All: How I Orchestrate Claude, Codex, and Gemini Without Them Knowing About Each Other
- Receipts, Not Chat Logs: What 2,472 AI Agent Dispatches Taught Me About Governance
- The Cascade of Doom: When AI Agents Hallucinate in Chains
- Why I Chose NDJSON Over Postgres for My AI Agent Audit Trail
- Claude Agent Teams vs. Building Your Own: What Anthropic Solved (And What They Left Out)
- The External Watcher Pattern: How I Observe AI Agents Without Trusting Their Self-Reports ← you are here
- Why Architecture Beats Models: Lessons from 2400+ AI Agent Dispatches ← you are here
- Async Quality Gates: Why AI Agents Don't Get to Decide When They're Done
- From Human-in-the-Loop to Human-on-the-Loop: A Production Graduation Path
- Traceability as Architecture: Designing AI Systems Where Every Decision Has a Receipt
- Decision-Making Architecture: Why Autonomous Agents Need Governance, Not Just Instructions
- Context Rotation at Scale: How VNX Keeps AI Agents Honest After 10,000 Dispatches
- Autonomous Agent Patterns: 5 Production-Tested Approaches for Agents That Run Without You
- Governance Scoring: How to Measure Whether Your AI Agent Deserves More Autonomy
Vincent van Deth
AI Strategy & Architecture
I build production systems with AI — and I've spent the last six months figuring out what it actually takes to run them safely at scale.
My focus is AI Strategy & Architecture: designing multi-agent workflows, building governance infrastructure, and helping organisations move from AI experiments to auditable, production-grade systems. I'm the creator of VNX, an open-source governance layer for multi-agent AI that enforces human approval gates, append-only audit trails, and evidence-based task closure.
Based in the Netherlands. I write about what I build — including the failures.