From Human-in-the-Loop to Human-on-the-Loop: A Production Graduation Path

The industry is having the wrong argument about AI autonomy.

One camp says agents should always have a human approving every action. The other says full autonomy is inevitable, so stop resisting. Both are wrong—because they treat autonomy as a binary switch rather than what it actually is: a graduated spectrum that you move along deliberately, with governance infrastructure at every step.

Deloitte's 2026 TMT Predictions put it plainly: "the most advanced businesses will begin to lay the foundation of shifting toward human-on-the-loop orchestration." Not human-out-of-the-loop. Not full autonomy. On the loop—where humans monitor outcomes instead of approving every individual action.

I've been running my AI agent system at that level for months. Not because I'm reckless, but because I built the governance architecture that makes it safe. This post maps the graduation path from human-in-the-loop to human-on-the-loop, and shows what you need at each level to move up without moving into danger.

The Autonomy Spectrum: Five Levels

VNX Orchestration Dashboard: terminal state, quality gates, receipt ledger

Borrowing from self-driving car classification (which the Cloud Security Alliance and others have adapted for AI agents), here's how I think about agent autonomy:

Level 0 — Manual. No AI involvement. You write every line, review every output, make every decision. This is where most teams were in 2023.

Level 1 — Assisted. AI suggests, human decides. Think Copilot completions, ChatGPT drafts you edit. The human is in the loop for every action. Safe, but slow.

Level 2 — Supervised. AI executes defined tasks, human reviews all outputs before they ship. Claude Code writing a function, you reading the diff. The agent does real work, but nothing moves forward without explicit approval.

Level 3 — Monitored. AI operates autonomously within boundaries. Humans review outcomes, not individual actions. The agent handles a complete dispatch—research, write, validate—and the human checks the result, not each step. This is human-on-the-loop.

Level 4 — Autonomous. AI handles most work independently. Human intervention is exception-based: the system flags anomalies, and only those get human review. The agent knows its own limits and escalates when uncertain.

Level 5 — Full Auto. End-to-end autonomous operation. No human in any loop. Currently theoretical for anything with real-world consequences.

The autonomy spectrum from Level 0 to Level 5

Most teams today are stuck oscillating between Level 1 and Level 2. The jump to Level 3 is where the real value lives—and where most governance frameworks fail.

Why Most Teams Get Stuck at Level 1

The reason is straightforward: they don't trust their agents, and they have no mechanism to build trust incrementally.

Without governance infrastructure, the only options are "approve everything" or "hope for the best." And since hope is not a strategy, responsible teams default to Level 1. They use AI as a fancy autocomplete. Every suggestion gets human review. Every output gets manually validated.

This works. But it doesn't scale. At Level 1, you're paying for AI compute and still doing all the cognitive work of verification. The throughput gains are marginal. The cost savings are minimal. And the human becomes the bottleneck in a system that was supposed to remove bottlenecks.

The problem isn't the team's caution—it's the absence of infrastructure that would let them be less cautious safely. You can't graduate to higher autonomy through willpower. You graduate through architecture.

The Graduation Requirements

Each level transition requires specific governance capabilities. Skip one and you'll either regress to a lower level or—worse—operate at a level your infrastructure can't support.

Level 1 → Level 2: Structured outputs. The agent needs to produce machine-parseable deliverables, not just prose. If you can't programmatically verify what the agent produced, you can't move beyond manual review. In my system, this means every dispatch produces structured NDJSON receipts, not chat messages. The receipt ledger is the foundation everything else builds on.

Level 2 → Level 3: Automated verification. This is the critical jump. You need a system that can verify agent outputs without human involvement for the common case. My async quality gates handle this—automated checks that analyze deliverables, assign risk scores, and only escalate to human review when something trips a threshold.

Here's the actual quality gate configuration from my production system:

yaml
# VNX Quality Gate Configuration — production
quality_gates:
  file_existence:
    required: true
    description: "Every deliverable file must exist on disk"
  test_pass:
    run_affected_tests: true
    require_green: true
    test_runner: "npm test"
    total_tests: 63        # all must pass before auto-approve
  function_size:
    max_lines: 32          # no function exceeds 32 lines
    action_on_fail: "reject_with_refactor_brief"
  secrets_check:
    scan_for: ["API_KEY", "SECRET", "PASSWORD", "TOKEN"]
    hardcoded_allowed: false
    action_on_fail: "block_immediately"
  complexity_metrics:
    max_cyclomatic: 15
    max_nesting_depth: 4
  risk_scoring:
    low: 0-40              # auto-approve, dispatch closes
    medium: 41-70          # approve with follow-up task
    high: 71-85            # T0 human review required
    critical: 86+          # block and alert

When the quality gate sees a risk score under 40, the dispatch closes automatically. Between 41 and 70, it closes but creates a follow-up task. Above 70, it stops and waits for my review. This is the mechanism that makes Level 3 possible: most work flows through without me, but nothing dangerous ships without my eyes on it.

Level 3 → Level 4: Independent observation. At Level 3, you're checking outcomes. At Level 4, you need a separate system watching the watcher. My external watcher provides this—a dual-input bridge that compares what the agent claims it did against what actually changed in the filesystem. When these diverge, it flags it before the quality gate even runs.

Reject-and-retry flow: hoe de orchestrator werk terugwijst met specifieke feedback

Level 4 → Level 5: Not yet. Full autonomy without any human oversight requires something no current AI system can provide: reliable self-awareness of its own failure modes. Until agents can genuinely know what they don't know—not just pattern-match against uncertainty prompts—Level 5 remains theoretical.

My Production Setup: Level 3 Going on 4

Here's where my system sits today in concrete terms.

A typical dispatch flows through the VNX orchestration like this:

  1. Task enters the queue — structured brief with acceptance criteria, dispatched via the orchestrator
  2. Agent executes independently — research, writing, code generation, whatever the task requires. No human approval at each step.
  3. Receipt is generated — NDJSON record with timestamp, deliverables manifest, token usage, model used, and execution metadata
  4. Quality gate runs — automated analysis of all deliverables against configured thresholds
  5. Decision point:
    • Risk score < 40 → auto-approved, dispatch closes
    • Risk score 41-70 → approved with follow-up task created
    • Risk score > 70 → blocked for human review
jsonl
{"dispatch_id":"d-2026-0317-0847","worker":"t2-content","status":"completed","risk_score":28,"decision":"auto_approve","quality_checks":{"file_analysis":"pass","complexity":"pass","integration":"pass"},"token_usage":{"input":12847,"output":3291},"model":"claude-sonnet-4-20250514","duration_s":34,"timestamp":"2026-03-17T08:47:12Z"}
{"dispatch_id":"d-2026-0317-0912","worker":"t1-research","status":"completed","risk_score":62,"decision":"approve_with_followup","quality_checks":{"file_analysis":"pass","complexity":"warning","integration":"pass"},"followup":"review-complexity-t1-0912","timestamp":"2026-03-17T09:12:44Z"}
{"dispatch_id":"d-2026-0317-0938","worker":"t2-code","status":"blocked","risk_score":84,"decision":"human_review","quality_checks":{"file_analysis":"pass","complexity":"fail","integration":"warning"},"blocked_reason":"cyclomatic_complexity_exceeded","timestamp":"2026-03-17T09:38:07Z"}

In the last 30 days of production operation, roughly 73% of dispatches auto-approve at Level 3. Another 21% approve with follow-up tasks. Only about 6% require my direct intervention—and those are genuinely the ones that need human judgment: architectural decisions, ambiguous requirements, edge cases the gate correctly identified as risky.

That 6% is the key number. It means I'm spending my time on the 6% that actually benefits from human reasoning, instead of burning attention on the 94% that doesn't.

The Governance Architecture That Makes This Safe

The question I get most often: "How do you sleep at night letting agents run without approval?"

The answer is three interlocking systems, each described in detail earlier in this series:

1. The Receipt Ledger — Every agent action produces an immutable, append-only NDJSON record. Not a chat log. Not a summary. A structured receipt with deliverables manifest, token counts, model attribution, and execution metadata. If something goes wrong, I can trace exactly what happened, when, with which model, and what it produced.

2. Async Quality Gates — Automated verification that runs on every dispatch completion. The agent doesn't get to say "done"—the gate decides if the work meets the configured standards. Risk scoring determines the disposition: auto-approve, approve-with-followup, or block-for-human.

3. The External Watcher — A separate observation layer that monitors agent behavior through filesystem watching, independent of what the agent reports. It compares claimed actions against actual changes. Divergence triggers alerts before the quality gate even evaluates the output.

These three layers create a trust architecture. Not blind trust—verified trust. The system earns autonomy by demonstrating reliability, and it can lose autonomy if patterns shift. If the auto-approve rate drops below 65% for a worker, I investigate. If a particular task type consistently scores above 70, I tighten the boundaries or restructure the dispatch pattern.

This is what Deloitte means by "laying the foundation" for human-on-the-loop. It's not about removing humans. It's about building the infrastructure that lets humans supervise outcomes instead of approving inputs—because the governance layer handles the verification that humans used to do manually.

Where This Goes Next

The path from Level 3 to Level 4 is about expanding the boundary conditions. Right now, my agents operate autonomously within well-defined task types. Level 4 means they can handle novel task types—ones I haven't explicitly configured—by reasoning about their own uncertainty and escalating when they're outside their training distribution.

That requires better uncertainty quantification than current models provide. But the governance infrastructure is already in place. When the models catch up, the graduation is incremental, not architectural.

That's the whole point of building governance first: architecture beats models. Models improve every quarter. The governance layer you build today will outlast every model version change, because it doesn't depend on any specific model's capabilities. It depends on structured outputs, automated verification, and independent observation—patterns that work regardless of whether the agent underneath is Claude, GPT, or something that doesn't exist yet.

If you're still at Level 1, that's fine. But start building toward Level 2. Get your outputs structured. Get your verification automated. The jump from 2 to 3 is where agents stop being expensive autocomplete and start being actual leverage.

And if you're designing an AI architecture for production agent systems, start with the governance layer. Everything else—model selection, prompt engineering, orchestration patterns—is secondary to the question: "How will I verify that this system is doing what I think it's doing?"

Answer that first. The autonomy will follow.


This post is part of the Glass Box Governance series.

Previous: Async Quality Gates — Why should the system, not the agent, decide when work is done?


📚 Glass Box Governance series

  1. One Terminal to Rule Them All: How I Orchestrate Claude, Codex, and Gemini Without Them Knowing About Each Other
  2. Receipts, Not Chat Logs: What 2,472 AI Agent Dispatches Taught Me About Governance
  3. The Cascade of Doom: When AI Agents Hallucinate in Chains
  4. Why I Chose NDJSON Over Postgres for My AI Agent Audit Trail
  5. Claude Agent Teams vs. Building Your Own: What Anthropic Solved (And What They Left Out)
  6. The External Watcher Pattern: How I Observe AI Agents Without Trusting Their Self-Reports
  7. Why Architecture Beats Models: Lessons from 2400+ AI Agent Dispatches
  8. Async Quality Gates: Why AI Agents Don't Get to Decide When They're Done
  9. From Human-in-the-Loop to Human-on-the-Loop: A Production Graduation Path ← you are here
  10. Traceability as Architecture: Designing AI Systems Where Every Decision Has a Receipt ← you are here
  11. Decision-Making Architecture: Why Autonomous Agents Need Governance, Not Just Instructions
  12. Context Rotation at Scale: How VNX Keeps AI Agents Honest After 10,000 Dispatches
  13. Autonomous Agent Patterns: 5 Production-Tested Approaches for Agents That Run Without You
  14. Governance Scoring: How to Measure Whether Your AI Agent Deserves More Autonomy

Vincent van Deth

AI Strategy & Architecture

I build production systems with AI — and I've spent the last six months figuring out what it actually takes to run them safely at scale.

My focus is AI Strategy & Architecture: designing multi-agent workflows, building governance infrastructure, and helping organisations move from AI experiments to auditable, production-grade systems. I'm the creator of VNX, an open-source governance layer for multi-agent AI that enforces human approval gates, append-only audit trails, and evidence-based task closure.

Based in the Netherlands. I write about what I build — including the failures.

Reacties

Je e-mailadres wordt niet gepubliceerd. Reacties worden beoordeeld voor plaatsing.

Reacties laden...