Three NDJSON Streams, Three Jobs, Zero Corruption: The Case for Log-Shaped State

Most multi-agent frameworks store state in a database. Foreign keys, transactions, an ORM, schema migrations, and a connection pool. The choice feels obvious: state is structured, you want to query it, what else would you use?

After six months of running VNX in production with three append-only NDJSON files and a JSON cache rebuilt on demand, I am convinced the database is the wrong default for AI orchestration. The full evolution from tmux pane to autonomous orchestrator shows the journey that led here.

This post is the why. What you give up, what you gain, and where the architecture breaks if you push it too far.

The premise: three streams, each scoped

VNX has three NDJSON streams. Each writes independently. Each owns one job. A malformed write to one cannot corrupt the others.

Stream 1, t0_receipts.ndjson, agent actions. Every dispatch lifecycle event a worker emits: task_started, task_complete, task_failed, task_timeout, gate_decisions, lease transitions. 6,000+ entries across the parent SEO-tool repo and standalone VNX repo combined, after eight months of daily use.

Stream 2, governance_audit.ndjson, system decisions. Every gate decision, dispatcher decision, escalation, or state mutation that crosses a governance boundary. PR #261 enforced dispatch_id + pr_number stamping; PR #273 fixed reader/writer path drift.

Stream 3, dispatch_register.ndjson, the canonical lifecycle timeline. dispatch_created, dispatch_promoted, dispatch_started, gate_passed, gate_failed, dispatch_completed, dispatch_rejected. PR #277 added the writer; PR #283 made the canonical aggregator.

Receipts answer "what did the agent do?" Audit log answers "what did the system decide about it?" Register answers "where is each dispatch in its lifecycle right now?" Three questions, three streams, no contention.

Three NDJSON streams architecture
Three independent append-only writers. Independent readers (SSE, aggregator, replay). Each stream owns one slice of the truth.

Why append-only NDJSON beats a database for this

Five real reasons, in order of how much each one has saved my bacon.

1. Crash resilience is free

A partial write of one NDJSON line cannot corrupt prior lines. Every line is self-contained: open the file in any editor, every line that ends with \n is valid JSON, every line that does not is the (possibly partial) tail.

A database crash mid-transaction needs WAL replay, lock recovery, sometimes manual intervention. An NDJSON crash needs nothing. The next writer appends after the last newline.

In six months of VNX I have had four hard crashes (laptop sleep + USB-C disconnect). Zero data loss. The dispatcher restarts via the unified supervisor, the receipt processor sweeps, and the system rebuilds derived state from receipts. Free recovery.

2. Unix tools work natively

bash
# How many task_complete events for terminal T1 in the last hour?
jq -c 'select(.event_type == "task_complete" and.terminal == "T1")' \
.vnx-data/state/t0_receipts.ndjson \
  | tail -100

# Cost per dispatch over the last 24 hours
jq -r '[.dispatch_id,.metadata.cost_est] | @csv' \
.vnx-data/state/t0_receipts.ndjson \
  | tail -50

No SQL client. No connection pool. No "let me check if there's a query for that." jq + grep + awk + tail covers 90% of operational queries instantly. The other 10% need a Python script over the same files, still no database.

3. Streamability for live dashboards

PR #304 added /api/register-stream SSE, a dashboard tab streams the dispatch register live, can disconnect/reconnect without losing events, and replays from the file on first connect.

python
# Conceptual SSE handler
def stream_register(since_ts):
    with open('.vnx-data/state/dispatch_register.ndjson') as f:
        for line in f:
            event = json.loads(line)
            if event['ts'] >= since_ts:
                yield f"data: {line}\n\n"
        # Then tail -f equivalent for new lines

No CDC tooling. No log shipping. No pub/sub. The file is the stream.

4. Schema evolution is forgiving

Add a new field to the receipt schema? Old receipts do not have it; new receipts do. Readers handle missing fields with defaults. No migration. No ALTER TABLE lock. No downtime.

This matters more than I expected. Across six months I added instruction_sha256 (PR #309), token tracking fields (PR #307), stuck_event_count (PR #310). Each addition was a Python tuple update, not a schema migration. The historical receipts stayed valid, queryable, and untouched.

5. The whole archive is auditable in 30 seconds

bash
# Audit: every gate decision in March 2026 that resulted in failure
jq -c 'select(.event_type == "gate_decision" and.ts > "2026-03-01" and.ts < "2026-04-01" and.status == "failed")' \
.vnx-data/state/governance_audit.ndjson

Show that to a compliance reviewer. They believe it because they can run it themselves. SQL queries against a database require trust in the query, the indexes, the schema. NDJSON queries are just bytes on disk.

The mental model: derived state is a cache

The non-obvious move is treating every JSON file other than the streams as derived state. t0_state.json, pr_queue_state.json, the lease summary tables, all caches. All rm-able.

scripts/build_t0_state.py (974 LOC) is the single canonical rebuilder. It replays the receipt ledger, applies the projection, writes t0_state.json. PR #275 added auto-rebuild triggered on state_mutation_receipt. PR #287 expanded triggers.

If I wanted to change the projection, say, add a new aggregate metric, I write a new field in the rebuilder and run it once. Every historical event re-projects through the new logic. No backfill scripts. No data migration. No "oh wait, the historical data does not have that metric." It does. The receipts had it all along.

I wrote about this principle in detail, receipts as the database, every other file as a cache.

Where the architecture breaks

Honest section. Three places where log-shaped state struggles.

Multi-stream JOIN queries

"Show me every dispatch where Codex's gate failed AND the worker was T2 AND the duration was over 5 minutes", that needs joining receipts (worker + duration) with audit log (gate decision). In Postgres: one query. With NDJSON: a Python script that streams both files and joins by dispatch_id.

For ad-hoc analytics this is fine. For a real-time dashboard with 50 such queries per page load, you would build aggregator caches, which is what we already do, but at scale you might want a real OLAP store.

Hot-path read latency

Reading the latest 10 entries from a 10-MB NDJSON file is fast. Reading them from a 1-GB file is slower. SSE streams handle this with archive rotation, once a dispatch finishes, its event archive is moved to archive/{terminal}/{dispatch_id}.ndjson and the live file is truncated (docs/operations/EVENT_STREAMS.md).

Compaction is non-optional past a certain volume. PR #299's compact_state.py runs nightly with three modes: rotate intelligence archive >50MB, cap receipts at 10K, evict open-items digest entries >30d.

After 391 MB of intelligence archive, the SSE streaming choked. Thirty lines of Python in a cron drained the swamp. Boring, necessary, the kind of feature nobody writes a blog about until it breaks.

Concurrent writers from the same process

NDJSON is safe under POSIX append semantics: open(path, 'a') writes are atomic if each write is smaller than PIPE_BUF (typically 4096 bytes on Linux/macOS). Receipts are well under that. Audit log entries are well under that.

But: if you have multiple processes appending without O_APPEND discipline, you get interleaved bytes and corrupted lines. We had one bug in PR #261 era where a writer used f.write() without explicit O_APPEND flag and a race produced garbled lines on macOS. The fix was the explicit flag.

This is the kind of trap a database hides from you. With NDJSON you have to know about O_APPEND. Worth the cost in our setup. Not free.

Where it shines: replay, audit, postmortem

The argument for this architecture lands hardest in three places.

Replay. Every dispatch's tool-use timeline can be replayed from disk. The Mission Control dashboard's event replay tab reads archive/{terminal}/{dispatch_id}.ndjson and renders a phase-colored, scrubbable timeline. Behavioral postmortems take minutes, not hours.

Audit. "Show me every commit produced by agent T2 in March that touched the auth module." That is a one-liner with jq + grep over the receipts. With a database you would still write a query, but the receipts version requires zero special schema knowledge.

Postmortem. When the dispatcher silently exited on April 28 and left six stale leases, the postmortem was a jq over the audit log to find the last successful sweep, plus an event archive replay to see what was running. Twenty minutes from "what happened" to "here is the timeline". A database backed by the same data would not have been faster.

Anti-claims

A few things I will not claim.

This is not a universal pattern. For OLTP workloads with concurrent multi-row updates and integrity constraints, NDJSON is the wrong choice. AI orchestration happens to be append-mostly, lifecycle-driven, and audit-heavy, that is exactly where log-shaped state wins.

Single-machine assumption. VNX runs on one operator's box. Replicating NDJSON across machines for HA is a project I have not done. There are good patterns (rsync + last-write-wins, or a real distributed log like Kafka), but I will not pretend the single-machine case scales linearly.

Schema discipline is on you. Database schemas are enforced by the engine. NDJSON schemas are enforced by you. PR #322 (canonicalizing the gate result schema) was a class of bug, verdict versus status keys, that a Postgres CHECK constraint would have caught. We caught it through schema drift bites.

What it changes in practice

Six months in, what I notice most is what I no longer have to do.

I do not write migrations. I do not run a database. I do not configure a connection pool. I do not worry about transaction isolation. I do not run pg_dump for backups (rsync of .vnx-data/ is the backup). I do not pay for a managed Postgres service.

What I do instead: I think about schemas before I add a field. I write aggregator scripts for derived state. I run compact_state.py nightly. I write integration tests that read the actual NDJSON output (not mocked DB rows).

The trade is favorable. For AI orchestration, by a wide margin.

Read also: Glass-Box Governance: receipts as the database, the canonical principle this architecture is built on.

For teams designing similar systems: I work on production-grade AI architecture.


Want to dig deeper into VNX architecture? The repo is open source. Issues and PRs welcome. Or connect on LinkedIn for the build-in-public updates.


Sources & references

  1. VNX repo
  2. VNX docs/operations/EVENT_STREAMS.md
  3. PRs: #261 (audit stamping), #273 (audit path drift), #277 (register writer), #283 (register aggregator), #299 (compact_state), #304 (SSE register), #307 (token tracking), #309 (instruction hash), #310 (worker health → events), #322 (gate schema canonicalization)
  4. Related: Glass-Box Governance, receipts as database

Vincent van Deth

AI Strategy & Architecture

I build production systems with AI — and I've spent the last six months figuring out what it actually takes to run them safely at scale.

My focus is AI Strategy & Architecture: designing multi-agent workflows, building governance infrastructure, and helping organisations move from AI experiments to auditable, production-grade systems. I'm the creator of VNX, an open-source governance layer for multi-agent AI that enforces human approval gates, append-only audit trails, and evidence-based task closure.

Based in the Netherlands. I write about what I build — including the failures.

Reacties

Je e-mailadres wordt niet gepubliceerd. Reacties worden beoordeeld voor plaatsing.

Reacties laden...