I Started a Joke Helper a Year Ago. Today I Open-Source It as a Control Plane for AI Coding CLIs.

A year ago I bolted a small helper onto another project. For fun. No plan, no framework in mind. One thing annoyed me, so I automated it.

What kept me up was not whether the AI could write code. It could do that already. It was a different question: what exactly did it do, who looked at it, and can I reconstruct that a month from now?

So I followed that one question. Evening after evening. Solo.

Today that helper is VNX, and I am open-sourcing it as 1.0.

What it actually is

📖 Read also: Glass-Box Governance: why your AI orchestrator should treat the receipt ledger as the database: the governance model VNX 1.0 is built on

VNX is a local control plane for AI coding CLIs. Not a framework you import. A runtime that drives the CLI tools already on your machine.

Each piece of work becomes a dispatch. Each dispatch gets an isolated tmux worker and its own git worktree. Work passes through human and reviewer gates. Every dispatch produces an append-only, hash-chained NDJSON receipt, so there is a forensic trail from dispatch to review to merge.

It drives Claude Code, Codex CLI, Gemini CLI, Kimi CLI, OpenRouter, and local Ollama. Not through vendor SDK imports. The runtime drives the provider CLIs directly, and provider constraints live in a machine-readable YAML so routing can pick a capable lane per dispatch.

The gap I kept hitting: everyone is building AI agents. Almost nobody is building a control plane for the AI CLI tools already installed on your machine. That is the thing I needed, so that is what I built.

What works today

I am going to be precise about maturity, because a launch is the wrong place to overclaim. These pieces are in production and have evidence behind them:

  • Append-only, hash-chained NDJSON receipts. Thousands of them in my own ledger. Tamper-evident by chain, not by trust.
  • A multi-CLI provider hub. Claude, Codex and Kimi all produce receipts in the same shape. No vendor SDK imports.
  • Review gates. Codex and Gemini review dispatches before merge. I gated this release's own pull requests through them.
  • Per-dispatch git worktree isolation, with a teardown classifier that distinguishes clean, committed, and dirty worktrees instead of treating cleanup as one state.
  • An interactive tmux worker lane, a provider-constraint YAML as the single source of truth, deterministic context injection with a repo map, and cost tracking per provider.
  • Memory in two tenses that hold up: the receipt ledger is the past, a SQLite WAL database is the live present.

One honest caveat inside this tier: provider maturity varies per lane, and worktree creation is solid for a single dispatch but races under parallel creation. I know exactly where that edge is, because I hit it building this.

📖 Read also: Why architecture beats models: lessons from 2400+ AI agent dispatches: the dispatch-level evidence behind VNX's design decisions

What is opt-in and still burning in

These ship, but off by default. They work mechanically. They are not battle-tested enough for me to call them proven:

  • Cost-aware routing that picks the cheapest capable model per dispatch.
  • An elastic worker pool.
  • A track layer for planning work, and a roadmap autopilot that advances features under human gates. The autopilot ships dark.
  • A self-learning loop that consolidates past review findings into context for future dispatches.
  • The future tense of memory: a roadmap graph the system can plan against.

If you turn these on, treat them as experimental. I do.

What is designed but not built

True parallel execution of multiple feature tracks at once. I designed it this week with three independent model reviews, and the design is honest about what is missing: a wave scheduler, a merge lease, and verified file-scope enforcement. It is on the roadmap. It does not work yet, and I am not going to pretend otherwise.

Where this came from

VNX was incubated inside a private product repository before it became its own project. The early evolution is not publicly replayable commit by commit. The public repository is the extraction, hardening, and packaging.

I wrote that history down before launch, in the repository, as an evolution timeline. The casual tinkering started about a year ago. The serious architecture is roughly six months. The public repo is about three and a half months old and 1027 commits in. A project about provenance should be able to show its own.

The timely part

On June 15, 2026, Anthropic moves headless claude -p and Agent SDK usage to API credits, while interactive Claude Code stays on the subscription. VNX's default Claude lane runs as interactive tmux operation, so it is built around the interactive product path rather than the headless SDK path. That has cost consequences. I am not claiming it is a loophole or future-proof. It is a consequence of using CLIs as workers, and vendors can change terms.

What this is not

It is not a security sandbox. It isolates work with tmux sessions and git worktrees, not hostile code. It is not compliance certification. It produces a local forensic audit trail, not a legal attestation. It is optimized for human-gated coding workflows, not fully autonomous merges.

Try it

It is pip-installable, has 6,200+ tests, and ships sample receipts you can inspect as NDJSON. It is open source because the architecture is portable, not because I am running a company on it.

I would most value feedback on the audit model, the provider-constraint YAML, and whether the install path makes sense from a clean machine.

The best thing I built started as a joke I could not let go of. Here it is.

VNX repo is open source. If you want to build something similar for your own organisation, see my AI architecture work.

Vincent van Deth

AI Strategy & Architecture

Vincent van Deth bouwt productiesystemen met AI voor het MKB. Hij is de maker van VNX, een multi-agent LLM orchestrator, en helpt teams betrouwbare AI-automatisering te shippen — zonder bullshit.

Reacties

Je e-mailadres wordt niet gepubliceerd. Reacties worden beoordeeld voor plaatsing.

Reacties laden...