Agent tooling · inside AJO

Agent Dev.

It builds the agents. Every role in the fleet is a contract plus a verification loop here — not a "you are a helpful…" persona.

Forging a role — act, validate, revise

ready

looping

Role contract

scope · output shape · escalation

• Output schema deterministic

• Required fields deterministic

• Citation integrity deterministic a cited source didn’t back its claim

• Quote length deterministic a quote ran past its limit

• Quality rubric rubric below the quality bar — tighten

✓ Cleared every check — role shipped to the registry, hot-reloaded on the next turn.

The runtime acts, validates the output against every check, and feeds the failures back to revise — looping until it clears all of them, or escalating with a structured reason. It never ships an answer that failed a gate.

Agent Dev builds the roles the rest of the fleet runs on. Its stance is the whole thing: a reliable agent is a contract plus a verification loop, not a persona. Instead of coaxing a prompt, you declare what a role owns, what it must output, and the checks it must clear — and the runtime revises until it clears them or escalates with a reason.

54role prompts passing the contract validator

2validator kinds — deterministic and rubric

≥1deterministic check required on every role

2shot-reload of a role change, no restart

01How it works

Standing up a new role.

The lead decides the shape — which role a project is missing, what it should own, and what it must refuse.
The coder wires it in — adds the role to the registry and creates its inbox and outbox — no plumbing left by hand.
The prompt writer authors it — writes the role’s instructions as a tight contract, not a personality.
The evaluator tests it — runs the role against a small packet to confirm it behaves as intended before it goes live.
Hot-reload, no restart — the registry and prompts are re-read every turn — a role change or model swap takes effect in about two seconds.

02What makes it different

A contract, not a persona.

Every role ships with validators, in two kinds. Deterministic checks enforce hard rules — the output matches the schema, the required fields are there, every citation actually supports its claim, no quote runs past its limit. A rubric check has the model grade the work against named criteria. A rubric alone is never enough: the framework refuses to build a role that doesn't pair it with at least one deterministic check.

When a role can't pass within its iteration budget, it doesn't fabricate — it returns a structured escalation naming the last attempt and the checks that failed. A role that can't clear its own gate says so.

03Forge from failure

A real failure is the input.

When a role fails a live turn, that exact turn becomes the fix. The real task, the real tool-call trace, and the real failure reason go straight to the improver, which rewrites the role's instructions — then runs the offline loop to confirm the new version converges before it's ever put back online. The fleet improves its own roles from what actually broke, not from a hypothetical.

Runs inside AJO. It’s the project that builds and hardens every other role — including the ones behind Tool Engineering and Discovery. Code is private; this page is the record.

← All work