Question generation · inside AJO / Research

Researcher.

It decides what's worth asking. Given an objective, it generates the research questions a pipeline should pursue — ranked by what their answers would change.

Context → ranked questions
scoring
Objective
where is a pain worth building?
#1
Who exactly has this pain, and where do they gather?
0.88
#2
Do they pay for a fix today — and who holds the budget?
0.86
#3
What do they use now, and exactly why does it fall short?
0.80
#4
What changed recently that makes this acute right now?
0.72
Is the market large and are the incumbents weak?
compound
Why do they all already pay for this?
presupposes spend
info gain
relevance to objective
failed a hard gate
The engine over-generates, strikes anything that fails a hard gate, and ranks the rest by what an answer would change. It decides what to ask — the Pain Point Pipeline answers.

Phrasing a question is easy. Knowing which one is worth asking is the hard part. Given an objective and what's already known, the engine finds the gaps, over-generates candidate questions, throws out the ones that fail a hard quality gate, and ranks what's left by how much an answer would move the decision. It hands those questions off — it doesn't answer them. That's the Pipeline's job.

6stages from context to ranked questions
4hard gates every question must pass
2value signals — info gain × relevance
0dependencies on the system it runs in
01How it works

Context in, ranked questions out.

  • Knowledge state — model what's known versus still open before asking anything.
  • Gap detection — find what's worth asking about — the unanswered, the contradictory, the thin, the missing link.
  • Perspective framing — induce distinct lenses so the questions have breadth, not five rewordings of one.
  • Generation — over-generate structured candidate questions, one batch per perspective and gap.
  • Quality gates — hard pass/fail filters — grounded, answerable, presupposition-clean, single-focus — then semantic de-dup.
  • Value scoring — score survivors by information gain and relevance, then select a diverse top-k.
02What makes it different

It ranks by value, not volume.

Every surviving question gets two scores: how much the spread of plausible answers would shift belief, and how directly that answer serves the objective. Give it the decision you're actually making and it damps questions whose answers wouldn't change your choice — it won't chase something interesting but useless.

Selection is diversity-aware: it picks the top few by value while penalizing anything too close to a question already chosen or already asked. You get a short, broad, high-value set — not five paraphrases of the same question.

03The gates

A separate model does the judging.

A question is only eligible if it passes every hard gate. The model that writes the questions never grades its own — the judge is a separate call, because self-evaluation is biased.

Grounded
no hallucinated premise — the question stands on something real.
Answerable
it can actually be resolved, not open-ended or unfalsifiable.
Presupposition-clean
it doesn't smuggle in an unverified assumption.
Single-focus
one question, not three compounded into a sentence.
04The loop

Ask, read, ask sharper.

On its own the engine does one principled pass. Hand it a way to answer — the Pain Point Pipeline — and it iterates: it asks, reads the evidence that comes back, and asks sharper. The two together are a self-driving research loop.

Built to stand alone. The engine knows nothing about the system around it — its entire contract is five small interfaces, decoupling proven by tests. It runs inside AJO, but depends on none of it. Code is private; this page is the record.

← All work