LLM output quality degrades as the context window fills up. A session that accumulates 50 tool calls, research notes, back-and-forth discussion, and thousands of lines of code will produce worse results towards the end than it did at the beginning. The model starts reusing variable names from earlier in the conversation, forgetting constraints, and producing code that contradicts what it wrote a few steps back.

I this blog post we will explore building a small framework called “driv” that addresses this by splitting work across isolated agents. Three agents, five commands, and a Node.js CLI, about 1000 lines total.

The patterns are transferable to any system where multiple autonomous processes need to collaborate on a shared task. This post walks through the full implementation: how agents get defined, how they share information through files, how the CLI routes state between them, and how lifecycle hooks provide visibility. By the end, you should have enough detail to build your own.

How Claude Code discovers agents and commands

The framework plugs into two extension points that Claude Code provides: commands and agents.

Claude Code looks for commands in ~/.claude/commands/. A markdown file placed at ~/.claude/commands/driv/plan-phase.md becomes the slash command /driv:plan-phase. The file’s frontmatter declares metadata (description, allowed tools, argument hints), and the body contains step-by-step instructions that Claude follows when the command is invoked.

Agents work similarly. A markdown file at ~/.claude/agents/driv-researcher.md registers an agent named driv-researcher. When a command spawns a subagent with subagent_type: "driv-researcher", Claude Code loads that file as the agent’s system prompt and gives it a fresh 200k-token context window.

The framework’s install script copies everything into these locations:

mkdir -p "$CLAUDE_DIR/commands/driv"
mkdir -p "$CLAUDE_DIR/agents"
mkdir -p "$CLAUDE_DIR/driv/bin"
mkdir -p "$CLAUDE_DIR/driv/templates"

cp commands/driv/*.md   "$CLAUDE_DIR/commands/driv/"
cp agents/*.md          "$CLAUDE_DIR/agents/"
cp bin/*.cjs            "$CLAUDE_DIR/driv/bin/"
cp templates/*.md       "$CLAUDE_DIR/driv/templates/"

It also registers lifecycle hooks in ~/.claude/settings.json for activity logging (covered later in this post). The hook registration is idempotent, so running the install script twice does not duplicate entries.

After installation, the project layout looks like this:

~/.claude/
  commands/driv/
    init.md              # /driv:init
    new-phase.md         # /driv:new-phase
    plan-phase.md        # /driv:plan-phase
    execute-phase.md     # /driv:execute-phase
    log.md               # /driv:log
  agents/
    driv-researcher.md
    driv-planner.md
    driv-executor.md
  driv/
    bin/driv-tools.cjs   # CLI for path resolution and state
    bin/driv-logger.cjs  # hook logger
    templates/           # PROJECT.md, REQUIREMENTS.md, ROADMAP.md, STATE.md

The workflow

The framework splits development into phases. Each phase moves through four steps: initialize the project, add a phase to the roadmap, plan it (research then plan), and execute the plan.

Running /driv:init creates a .planning/ directory in the project root and interviews you with four questions: project name, tech stack, what you’re building, and what phases you want to tackle. Your answers populate the template files. REQUIREMENTS.md gets a numbered list of concrete requirements (REQ-01, REQ-02, …). ROADMAP.md gets a checkbox per phase. STATE.md tracks where you are.

From there, the cycle is:

/driv:new-phase        # adds a phase to the roadmap, updates STATE.md
/driv:plan-phase <n>   # spawns researcher, then planner
/driv:execute-phase <n> # spawns executor

/driv:new-phase asks what you want to build next, appends the phase to ROADMAP.md, and offers to kick off planning immediately. /driv:plan-phase spawns the researcher agent, waits for it to write RESEARCH.md, then spawns the planner agent to produce PLAN.md. /driv:execute-phase spawns the executor agent with the plan contents.

If you tail .planning/driv.log during a session, you can watch the handoffs in real time:

[2026-03-23 20:38:17] ⟳ Summoning researcher agent — Research phase: Monthly entry reminder popup
[2026-03-23 20:39:49] ✓ researcher agent finished
[2026-03-23 20:40:01] ⟳ Summoning planner agent — Plan phase: Monthly entry reminder popup
[2026-03-23 20:40:40] ✓ planner agent finished
[2026-03-23 20:55:49] ⟳ Summoning executor agent — Execute phase: Monthly entry reminder popup
[2026-03-23 20:57:39] ✓ executor agent finished with errors

The CLI backbone

The orchestrator (the command files) needs structured data to route between agents: file paths, phase names, completion state. A small CLI tool, driv-tools.cjs, handles this. It is a zero-dependency Node.js script that commands call via Bash to get JSON back. It has six commands.

init returns the full project context: paths to all planning files, the current config, and the list of phases parsed from ROADMAP.md. Every command that needs to know where things are calls this first.

phase-path <n> is the most frequently called command. It takes a phase number, looks up the corresponding label in the roadmap, and derives a directory path from it:

const slug = phases[n - 1].label
  .toLowerCase()
  .replace(/[^a-z0-9]+/g, '-')
  .replace(/^-|-$/g, '');

const dirName = String(n).padStart(2, '0') + '-' + slug;
// "01-set-up-project-and-db-schema"

It also creates the phase directory if it does not exist, and returns JSON with paths to RESEARCH.md, PLAN.md, and SUMMARY.md within that directory.

write-config persists key-value pairs to .planning/config.json. The init command uses it to store the project name and tech stack.

complete-phase <n> marks the nth unchecked phase in ROADMAP.md as done by replacing [ ] with [x]. The execute-phase command calls this after the executor finishes.

add-phase appends a new checkbox line to ROADMAP.md and returns the new phase number. status returns whether the project is initialized, the full phase list with completion flags, and the next pending phase. The new-phase command calls status first to check that the project exists before trying to add anything.

The agents never call the CLI themselves. Commands call it, extract the JSON, and pass the resulting paths to agents as part of the spawn prompt.

Three agents, three roles

Each agent is a markdown file with YAML frontmatter and a structured body. The frontmatter declares the agent’s name, description, model, and allowed tools:

---
name: driv-researcher
description: >
  Researches the technical domain for a driv phase. Reads project requirements
  and the phase description, then produces a RESEARCH.md that the driv-planner
  will consume to create PLAN.md files. Spawned by /driv:plan-phase.
tools: Read, Bash, Glob, Grep, WebFetch
model: sonnet
---

The body is the agent’s system prompt, organized with XML blocks. The <activation> block defines what inputs the agent receives and what steps to follow. A <scope> or <constraints> block lists what the agent must not do. The <output_format> block specifies the exact structure of the output file. And a <quality_gate> block provides a checklist the agent verifies before writing.

The researcher

The researcher reads project requirements, scans the codebase, and writes RESEARCH.md. The key design decision is in its <downstream_consumer> block:

<downstream_consumer>
Your RESEARCH.md will be read by driv-planner as its primary input.
The planner uses specific sections to make decisions:

## Standard stack    → the planner uses ONLY these libraries, no substitutions
## Patterns          → the planner structures tasks to follow these exactly
## Do not hand-roll  → the planner will never build custom solutions for these
## Pitfalls          → the planner embeds these as <verify> steps in tasks
## File map          → the planner uses these paths when naming files in tasks

If a section is vague, the planner will guess. Guesses produce wrong plans.
Be specific: include import names, function signatures, config keys.

BAD:  "Use a JWT library for authentication"
GOOD: "Use `io.jsonwebtoken:jjwt-api:0.12.6` with `jjwt-impl` and `jjwt-jackson`.
Sign: `Jwts.builder().subject(userId).expiration(exp).signWith(key).compact()`
Verify: `Jwts.parser().verifyWith(key).build().parseSignedClaims(token)`
— throws `ExpiredJwtException` on expiry, catch it in the filter."
</downstream_consumer>

This block tells the researcher exactly who will read its output and what happens when the output is vague. The difference in output quality is significant. Before I added this block, the researcher wrote exploratory recommendations (“you might consider using jjwt or Nimbus JOSE”). After adding it, the output shifted to concrete decisions with exact Maven coordinates, import statements, and method signatures. The downstream consumer pattern changed the agent’s behavior more than any other single prompt change I made.

The researcher’s quality gate enforces this further by checking that every library has an exact import pattern, every pitfall is specific enough to become a verify step, the file map is complete, and no hedge words appear (“might”, “could consider”, “perhaps”).

The researcher’s tool set is Read, Bash, Glob, Grep, WebFetch. It has no Write or Edit tools. It writes RESEARCH.md to its designated output path through Bash, and its <scope> block explicitly restricts it from modifying any other files.

The planner

The planner reads RESEARCH.md and REQUIREMENTS.md and produces PLAN.md containing XML task blocks. Its <task_scoping> rules keep tasks small:

<task_scoping>
A well-scoped task:
- Creates or modifies 1-4 files
- Can be described completely in 10-20 lines of action text
- Has a verify command that runs in under 30 seconds
- Builds on the state left by the previous task in this plan
</task_scoping>

Each plan has two to three tasks. The planner’s quality gate checks that every library in <action> matches what the researcher specified in “Standard stack”, every <verify> is a real shell command, no task touches more than four files, and no hedge words appear.

The planner gets Read, Write, Bash, Glob, Grep. It can create the plan file, but it cannot edit existing source code.

The executor

The executor reads the plan and implements each task. Its system prompt opens with a hard constraint:

You do not add features. You do not refactor unrelated code. You do not improve
things you notice along the way. You implement what is in each <task> block,
verify it, and commit it.

For each task, the executor follows a fixed loop: announce the task, read existing files, implement the <action> instructions, run the <verify> command, and make a git commit. If verification fails, it retries up to three times. After three failures it reports the error and moves on.

The executor is the only agent with Edit access. Its tool set is Read, Write, Edit, Bash, Glob, Grep. If the researcher or planner tries to modify the codebase, the tool restriction stops them before anything happens.

The plan format: XML task blocks

The planner outputs XML task blocks with named fields:

<task id="01-01">
  <n>Initialise Spring Boot project</n>
  <files>pom.xml, src/main/java/com/example/taskapi/TaskApiApplication.java</files>
  <action>
    Create pom.xml with spring-boot-starter-web, spring-boot-starter-data-jpa,
    postgresql driver, Java 21.
    Create TaskApiApplication.java with @SpringBootApplication and main method.
    Add application.yml with datasource config for PostgreSQL on localhost:5432.
  </action>
  <verify>./mvnw compile -q</verify>
  <done>Project compiles without errors.</done>
</task>

I chose XML over markdown for the plan format. Markdown lists are inherently ambiguous when parsed by an LLM. A bullet point under a “Files” heading could be interpreted as a suggestion, an example, or a hard constraint depending on how the model reads the surrounding context. XML tags are unambiguous by nature: <files> is a named, closed container. The model treats its contents as data, not prose. The same applies to <verify>. In a markdown section, the executor might skip a verification step or treat it as optional guidance. Wrapped in <verify> tags, it becomes a command to execute literally.

Claude has been trained on enough structured XML that it treats tag contents as data to preserve rather than prose to rephrase. In practice, the model is less likely to paraphrase or reinterpret the contents of a <files> tag compared to a markdown paragraph containing the same information.

The <verify> field is the most important part of the plan. A weak verification step lets broken implementations pass silently:

<!-- Weak: passes if the file exists, says nothing about correctness -->
<verify>test -f src/main/java/com/example/taskapi/auth/AuthController.java</verify>

<!-- Better: checks the code compiles -->
<verify>./mvnw compile -q</verify>

<!-- Best: actually exercises the behaviour -->
<verify>./mvnw test -pl . -Dtest=AuthControllerTest -q</verify>

The orchestrator: thin by design

The orchestrator is the set of command files that coordinate agent handoffs. It stays deliberately thin. If it accumulated the full output of every agent it spawned, it would end up with the same context bloat problem the framework was designed to avoid. It stays under a few thousand tokens per phase. It knows that RESEARCH.md was written, but it never reads the content. It knows that PLAN.md exists, but it only checks that it is non-empty.

Here is the actual plan-phase.md command (simplified for readability):

## Step 1: Get context
Run `driv-tools.cjs phase-path $ARGUMENTS`, extract PHASE_DIR, PHASE_LABEL,
RESEARCH_PATH, PLAN_PATH from the JSON output.

## Step 2: Spawn the researcher
Use the Task tool with subagent_type: "driv-researcher"
Pass: PHASE, REQUIREMENTS_PATH, OUTPUT_PATH (= RESEARCH_PATH)
Wait for completion.

## Step 3: Verify research output
Read RESEARCH_PATH. If it does not exist or is empty, tell the user and stop.

## Step 4: Spawn the planner
Use the Task tool with subagent_type: "driv-planner"
Pass: PHASE, RESEARCH_PATH, REQUIREMENTS_PATH, OUTPUT_PATH (= PLAN_PATH)
Wait for completion.

## Step 5: Verify plan output
Read PLAN_PATH. If it does not exist or is empty, tell the user and stop.

## Step 6: Update STATE.md
Append a log entry with the phase number, date, and file paths.

Each step is explicit and mechanical. The orchestrator does not decide what to do. It follows the script. This is possible because all the intelligence lives in the agent system prompts and the research/plan documents on disk.

There is one practical workaround in the spawn prompts. Each begins with “Before doing anything else, read your full instructions from: $HOME/.claude/agents/driv-researcher.md”. This addresses a Claude Code issue where agent body content was not always injected correctly into the subagent’s context. Having the agent read its own file as the first step ensures the instructions land reliably.

Files as the communication layer

All inter-agent state lives in the .planning/ directory:

.planning/
  config.json
  PROJECT.md
  REQUIREMENTS.md
  ROADMAP.md
  STATE.md
  driv.log
  phases/
    01-set-up-project/
      RESEARCH.md     <- written by researcher, read by planner
      PLAN.md         <- written by planner, read by executor
    02-build-api/
      RESEARCH.md
      PLAN.md

The researcher writes RESEARCH.md. The planner reads RESEARCH.md and writes PLAN.md. The executor reads PLAN.md and writes code. No agent reads another agent’s prompt or output except through these files.

This makes everything inspectable. If the executor writes bad code, you trace it backwards: read the PLAN.md to see if the instructions were wrong, then read RESEARCH.md to see if the research was wrong. The debug process is cat.

The file-based approach also means the framework survives a closed session. If you plan a phase, close Claude Code, and come back later, all the state is on disk. Run /driv:execute-phase 1 and it picks up where you left off.

Logging through hooks

Claude Code supports lifecycle hooks: shell commands that run before or after specific tool calls, or when a session ends. The framework registers three hooks during installation.

The install script merges these entries into ~/.claude/settings.json:

// PreToolUse: fires before Task tool
settings.hooks.PreToolUse.push({
  matcher: 'Task',
  hooks: [{ type: 'command', command: `node ${logger} pre-task`, timeout: 5 }]
});

// PostToolUse: fires after Task tool
settings.hooks.PostToolUse.push({
  matcher: 'Task',
  hooks: [{ type: 'command', command: `node ${logger} post-task`, timeout: 5 }]
});

// Stop: fires when Claude's turn ends
settings.hooks.Stop.push({
  hooks: [{ type: 'command', command: `node ${logger} stop`, timeout: 5 }]
});

The matcher: 'Task' field means these hooks only fire for the Task tool. Regular Bash and Read calls do not trigger them, so the log stays clean.

The logger script (driv-logger.cjs) reads a JSON payload from stdin, extracts the agent type and description from tool_input, and appends a timestamped line to .planning/driv.log. It strips the driv- prefix from agent names for readability (“driv-researcher” becomes “researcher agent”). On the stop event it writes a separator line.

One safety detail: the logger checks whether .planning/ exists before writing. Since the hooks are registered globally in settings.json, they fire in every project. The existence check means the logger silently exits in projects that are not using the framework.

Walking through a full cycle

To make this concrete, here is what a full cycle looks like on a new project.

You open Claude Code in your project directory and run /driv:init. The init command creates .planning/, copies templates, and asks four questions:

> What is the name of this project?
  task-api

> What is the tech stack?
  Java 21, Spring Boot 3, PostgreSQL

> What are you building?
  A REST API for task management with JWT authentication and role-based access.

> List the development phases, one per line.
  Set up project and DB schema
  Build API endpoints
  Add JWT auth

It populates REQUIREMENTS.md with numbered requirements derived from your answers, creates checkboxes in ROADMAP.md, and saves the config. You get a confirmation:

driv initialised ✓

.planning/
  PROJECT.md      — project vision
  REQUIREMENTS.md — 5 requirements
  ROADMAP.md      — 3 phases
  STATE.md        — session memory

Next: /driv:plan-phase 1

You run /driv:plan-phase 1. The command calls driv-tools.cjs phase-path 1, which parses ROADMAP.md, finds the first phase (“Set up project and DB schema”), derives the directory name 01-set-up-project-and-db-schema, and returns JSON with all the paths. The command then spawns the researcher agent with the phase label, the path to REQUIREMENTS.md, and the output path for RESEARCH.md.

The researcher scans the codebase, checks what already exists, researches any unknowns, and writes RESEARCH.md. The output looks something like this (abbreviated):

# Research: Set up project and DB schema

## Context
Initial project setup for a REST API with Spring Boot 3 and PostgreSQL.

## Standard stack
- `spring-boot-starter-web` — embedded Tomcat, REST support
- `spring-boot-starter-data-jpa` — Hibernate + Spring Data repositories
- `org.postgresql:postgresql` — JDBC driver
- Java 21, Maven wrapper for builds

## Patterns
- Main class annotated @SpringBootApplication in com.example.taskapi
- Entities in model/, repositories in repository/, controllers in controller/
- Database config in application.yml, not application.properties

## Pitfalls
- spring.jpa.hibernate.ddl-auto must be "validate" in production, use "create" only for initial dev
- PostgreSQL JDBC URL format: jdbc:postgresql://host:5432/dbname

## File map
- CREATE pom.xml
- CREATE src/main/java/com/example/taskapi/TaskApiApplication.java
- CREATE src/main/resources/application.yml

The orchestrator verifies the file exists, then spawns the planner with the research path and a new output path for PLAN.md. The planner reads the research, checks the codebase state, and writes task blocks:

<task id="01-01">
  <n>Initialise Spring Boot project</n>
  <files>pom.xml, src/main/java/com/example/taskapi/TaskApiApplication.java,
         src/main/resources/application.yml</files>
  <action>
    Create pom.xml with parent spring-boot-starter-parent 3.2.x, Java 21.
    Add dependencies: spring-boot-starter-web, spring-boot-starter-data-jpa,
    postgresql driver.
    Create TaskApiApplication.java with @SpringBootApplication and main method.
    Create application.yml with spring.datasource.url, username, password
    for local PostgreSQL.
  </action>
  <verify>./mvnw compile -q</verify>
  <done>Project compiles without errors.</done>
</task>

You review the plan, then run /driv:execute-phase 1. The executor reads each task block, implements the code, runs the verify command, and makes a git commit per task. After all tasks are processed, driv-tools.cjs complete-phase 1 marks the first checkbox in ROADMAP.md as done.

For the next phase, you run /driv:new-phase, tell it what you want to build, and the cycle repeats.

What I learned building this

The downstream consumer pattern was the single highest-leverage change. Telling the researcher “the planner will turn your output into executable tasks” produced better output than any amount of prompt tuning about format or specificity. Any pipeline where one stage produces input for the next benefits from making that consumer relationship explicit in the producer’s instructions.

Tool restrictions enforce role boundaries more reliably than prompt instructions. You can ask an agent to stay in its lane, and it might comply. Removing the tool makes the constraint structural. I spent time early on writing elaborate scope instructions before realizing that the frontmatter tools field did the job in one line.

The coordination layer can be simpler than you expect. Files on disk are durable, inspectable, and require zero infrastructure. There is no hidden state, no message serialization, and the debug process is cat. For a system where agents run sequentially and produce discrete artifacts, a shared directory is enough.