Agent tooling · inside AJO

Agent Dev.

It builds the agents. Every role in the fleet is a contract plus a verification loop here — not a "you are a helpful…" persona.

Forging a role — act, validate, revise
ready
looping
Role contract
scope · output shape · escalation
Output schema deterministic
Required fields deterministic
Citation integrity deterministic
Quote length deterministic
Quality rubric rubric
Cleared every check — role shipped to the registry, hot-reloaded on the next turn.
The runtime acts, validates the output against every check, and feeds the failures back to revise — looping until it clears all of them, or escalating with a structured reason. It never ships an answer that failed a gate.

Agent Dev builds the roles the rest of the fleet runs on. Its stance is the whole thing: a reliable agent is a contract plus a verification loop, not a persona. Instead of coaxing a prompt, you declare what a role owns, what it must output, and the checks it must clear — and the runtime revises until it clears them or escalates with a reason.

54role prompts passing the contract validator
2validator kinds — deterministic and rubric
≥1deterministic check required on every role
2shot-reload of a role change, no restart
01How it works

Standing up a new role.

  • The lead decides the shape — which role a project is missing, what it should own, and what it must refuse.
  • The coder wires it in — adds the role to the registry and creates its inbox and outbox — no plumbing left by hand.
  • The prompt writer authors it — writes the role’s instructions as a tight contract, not a personality.
  • The evaluator tests it — runs the role against a small packet to confirm it behaves as intended before it goes live.
  • Hot-reload, no restart — the registry and prompts are re-read every turn — a role change or model swap takes effect in about two seconds.
02What makes it different

A contract, not a persona.

Every role ships with validators, in two kinds. Deterministic checks enforce hard rules — the output matches the schema, the required fields are there, every citation actually supports its claim, no quote runs past its limit. A rubric check has the model grade the work against named criteria. A rubric alone is never enough: the framework refuses to build a role that doesn't pair it with at least one deterministic check.

When a role can't pass within its iteration budget, it doesn't fabricate — it returns a structured escalation naming the last attempt and the checks that failed. A role that can't clear its own gate says so.

03Forge from failure

A real failure is the input.

When a role fails a live turn, that exact turn becomes the fix. The real task, the real tool-call trace, and the real failure reason go straight to the improver, which rewrites the role's instructions — then runs the offline loop to confirm the new version converges before it's ever put back online. The fleet improves its own roles from what actually broke, not from a hypothetical.

Runs inside AJO. It’s the project that builds and hardens every other role — including the ones behind Tool Engineering and Discovery. Code is private; this page is the record.

← All work