Tool Engineering.
An autonomous factory that takes a qualified pain and carries it to a shipped, tested tool — a human stepping in only at four gates.
This is the other half of the loop. The Pain Point Pipeline finds a pain worth building; the factory builds it. A plain Python conductor — no model in the driver's seat — carries the pain through seven stages, spawning a fresh session for each and advancing only when a deterministic check passes. A human enters at four gates. Everything between them runs itself.
Seven stages, one shipped tool.
- S0 · Intake — a qualified pain handoff arrives and the conductor spins up the product’s own git repo. (Gate G0: I promote it.)
- S1 · Design — a fresh session writes the product direction as a checked artifact. (Gate G1: I approve the direction.)
- S2 · Design-research — the design is researched adversarially; every claim carries a citation, verified by a separate session.
- S3 · Spec — the spec and a feature list become the buildable contract the later stages are held to.
- S4 · Validation — a prototype is built and reviewed. (Gate G2: I kill, pivot, or continue.)
- S5 · Build — a builder session writes the app — and never sees the acceptance tests.
- S6 · Verify + ship — the holdout suite runs; it had to be red before the build existed. (Gate G3: I approve live use and ship.)
No model decides its own grade.
A stage advances when a code predicate confirms its artifacts exist and pass — never because a model reported its own work was good. That one rule is the whole architecture. A gate that can't fail can't gate, so every gate has a failing-case test, including each permission rule.
The builder is blind to the test. The acceptance suite lives outside the product's repo, denied to the build session, with a tripwire that audits the transcript afterward. The build can't teach to a test it never sees. And each stage runs in a fresh session with only the permissions it needs — a read-only stage physically can't write.
Four decisions, mine to make.
The factory runs itself between gates. At four points it stops and waits for a call it shouldn't make alone.
Keeping the conductor dumb.
The conductor is plain Python on purpose — the moment a model decides whether to proceed, the guarantee is gone. The hard part was everything around that: spawning headless sessions that actually write files, permission profiles enforced as deny-complements so a read-only stage stays read-only, and a heartbeat that doesn't kill a product just for waiting at a gate. Every bug got pinned with a failing test — which is why the suite is worth trusting.
Where it runs. Inside AJO, fed by the Pain Point Pipeline. Currently attended, one stage at a time. Code is private; this page is the record.