Last month I built a small agent framework for Claude Code. Three agents, three commands, about 400 lines of code total. No external dependencies, no API layer, no message queue. Just markdown files, a Node.js CLI, and the filesystem.
The patterns I landed on are transferable to any system where you need multiple autonomous processes to collaborate on a shared task without stepping on each other. I’ll go through the structural decisions that make it work: how agents get spawned, how they share information, how you prevent them from going off the rails, and what happens when you give each agent its own fresh context window instead of cramming everything into one.
The Problem With One Big Session
LLM output quality degrades as the context window fills up. A session that accumulates 50 tool calls, research notes, and thousands of lines of code will produce worse results towards the end than it did at the beginning. The model starts reusing variable names from earlier in the conversation, forgetting constraints, or producing code that contradicts what it wrote a few steps back.
The standard solution in agent frameworks is to split work across multiple isolated contexts. Each agent gets a fresh context window with only the information it needs, then writes its output to a location the next agent can read.
Three Agents, Three Roles
The framework splits a development phase into three sequential steps, each handled by a dedicated agent.
The researcher goes first. It reads the project requirements, scans the codebase, and produces a research document with concrete technical decisions: which libraries to use, which patterns to follow, which files to create or modify. The key constraint is that it writes concrete decisions: “Use jose@^5 for JWT” with exact import statements and usage patterns. This matters because its output feeds directly into the next agent.
The planner reads the research document and turns it into an executable plan. The plan contains two to three XML task blocks, each specifying exactly which files to touch, what to implement, and a shell command that verifies the result. The planner never invents. Every library and file path in the plan must come from the research document.
The executor reads the plan and implements each task in order. For each task it reads the relevant files, writes the code, runs the verify command, and makes a git commit. If verification fails, it retries up to three times before moving on.
The CLI Backbone
The orchestrator needs structured data to route between agents: file paths, phase names, progress state. So the framework includes a small CLI tool, driv-tools.cjs, a zero-dependency Node.js script that commands call via Bash to get JSON back. It handles path resolution, config persistence, and phase completion tracking.
Its most central command is phase-path <n>, which resolves a phase number into a directory. The roadmap is a markdown file with checkbox lines:
- [x] Set up project and DB schema
- [ ] Build API endpoints
- [ ] Add JWT auth
The CLI parses these lines, matches the phase number to the corresponding label, and derives a directory slug from it:
const slug = phases[n - 1].label
.toLowerCase()
.replace(/[^a-z0-9]+/g, '-')
.replace(/^-|-$/g, '');
const dirName = String(n).padStart(2, '0') + '-' + slug;
// "01-set-up-project-and-db-schema"
Every command in the framework calls this tool first, then uses the JSON output to construct the prompts it passes to agents. The agents never call the CLI themselves. They receive paths, read the files at those paths, and write their output.
Files as the Communication Layer
All inter-agent state lives in a .planning/ directory in the project root. When you run /driv:plan-phase 1, the orchestrator calls the CLI tool to resolve the phase directory and get structured JSON with all the relevant paths:
{
"phaseDir": ".planning/phases/01-set-up-project",
"researchPath": ".planning/phases/01-set-up-project/RESEARCH.md",
"planPath": ".planning/phases/01-set-up-project/PLAN.md"
}
It then spawns the researcher as a subagent, passing the phase name, the path to the requirements file, and the output path where it should write its results. The researcher does its work and writes RESEARCH.md. When it finishes, the orchestrator spawns the planner with the path to that research file as input and a new output path for PLAN.md. Each agent receives only the file paths it needs. It reads those files itself in its own context window.
.planning/
PROJECT.md
REQUIREMENTS.md
ROADMAP.md
STATE.md
phases/
01-set-up-project/
RESEARCH.md <- written by researcher, read by planner
PLAN.md <- written by planner, read by executor
This makes everything inspectable. If the executor writes bad code, you trace it to a bad plan by reading files. It also means the framework survives a closed session, since you can pick up where you left off by running the next command.
Spawning Agents and Restricting Their Tools
Each agent is defined as a markdown file with YAML frontmatter. The frontmatter declares the agent’s name, a description, the model to use, and a list of allowed tools:
---
name: driv-researcher
description: Researches the technical domain for a phase.
tools: Read, Bash, Glob, Grep, WebFetch
model: sonnet
---
The body of the file is the agent’s system prompt, written in plain text with XML blocks for structure. It tells the agent what it is, what inputs to expect, what output to produce, and how to format it.
When the orchestrator needs to spawn an agent, it uses Claude Code’s Task tool with a subagent_type parameter that matches the agent’s name. Claude Code loads the corresponding markdown file as the system prompt and gives the agent a fresh 200k-token context window containing only that prompt, the task instructions from the orchestrator, and any project-level configuration files.
The tool restrictions enforce role boundaries. The researcher gets Read, Bash, Glob, Grep, WebFetch but cannot write source files. The planner gets Read, Write, Bash, Glob, Grep so it can produce the plan, but cannot edit existing files. The executor is the only agent with Edit access. If an agent hallucinates a task outside its role, it simply lacks the tool to carry it out.
Telling Agents Who Reads Their Output
By default, an LLM writes for a human reader. It hedges, explores alternatives, and offers options. “You might consider using jose or jsonwebtoken for JWT handling” is a reasonable thing to say to a person. It is a terrible thing to say to the next agent in your pipeline.
The researcher’s system prompt includes a <downstream_consumer> block that explicitly states its output goes to the planner, and that vague output will produce wrong plans. This single addition changed the output from exploratory recommendations to concrete decisions with exact import statements, version constraints, and usage patterns.
The same principle applies at each handoff. The planner knows the executor will follow its instructions literally, so it writes precise implementation steps. The executor knows its output is git commits, so it writes code that passes a verification command.
The Plan Format: XML Task Blocks
The plan file uses XML task blocks with named fields:
<task id="01-01">
<n>Initialise TypeScript project</n>
<files>package.json, tsconfig.json, src/index.ts</files>
<action>
Run: npm init -y
Install: npm install express pg
Install dev: npm install -D typescript @types/node @types/express
Create tsconfig.json with target: "ES2022", module: "commonjs", strict: true.
Create src/index.ts with a basic Express app on PORT env var (default 3000).
</action>
<verify>npx ts-node src/index.ts & sleep 2 && curl -sf http://localhost:3000 ; kill %1</verify>
<done>Express server starts without errors.</done>
</task>
I chose XML over markdown for this because Claude parses XML reliably and the named tags remove ambiguity. In a markdown list, the executor might treat an item as optional or advisory. In XML, <files> means “these are the files you touch, no others” and <verify> means “run this command, exit 0 means success.” There is no room for interpretation.
Each plan contains two to three tasks, scoped so that each one touches at most four files and can be verified with a single shell command that runs in under 30 seconds. One task, one commit, one verification. If verification fails after three retries, the executor moves on and reports the failure.
The <verify> field is the most important part of the plan. A weak verification step lets broken implementations pass silently:
<!-- Weak: passes if the file exists, says nothing about correctness -->
<verify>test -f src/routes/auth.ts</verify>
<!-- Better: checks the file contains what matters -->
<verify>grep -q "router.post.*login" src/routes/auth.ts</verify>
<!-- Best: actually exercises the behaviour -->
<verify>curl -sf -X POST http://localhost:3000/auth/login \
-H "Content-Type: application/json" \
-d '{"email":"test@test.com","password":"wrong"}' \
| grep -q "401"</verify>
The Orchestrator: Thin by Design
The orchestrator stays deliberately thin. If it accumulated the full output of every agent it spawned, it would end up with the same bloated context problem the framework was designed to avoid. Instead, it stays under a few thousand tokens per phase. It knows that RESEARCH.md was written, but it never reads it. It knows that PLAN.md exists, but it only checks that it is non-empty.
The commands themselves are markdown files that Claude Code reads as step-by-step instructions. The core of the plan-phase command, simplified:
Step 1: Run driv-tools.cjs phase-path $ARGUMENTS, extract paths from JSON.
Step 2: Spawn driv-researcher with PHASE, REQUIREMENTS_PATH, OUTPUT_PATH.
Step 3: Verify RESEARCH.md exists and is non-empty.
Step 4: Spawn driv-planner with PHASE, RESEARCH_PATH, OUTPUT_PATH.
Step 5: Verify PLAN.md exists and is non-empty.
Step 6: Append a log entry to STATE.md.
Each step is explicit and mechanical. The orchestrator does not decide what to do. It follows the script.
Logging Through Hooks
The framework includes an activity log so you can watch what is happening while agents work. Claude Code supports lifecycle hooks configured in a settings file. The framework registers three: one that fires before an agent is spawned, one that fires after it finishes, and one that fires when a session ends.
Each hook calls a small logger script that reads a JSON payload from stdin, extracts the agent type and description, and appends a timestamped line to .planning/driv.log:
[2025-03-20 14:02:11] Summoning researcher agent — Research phase: Set up project
[2025-03-20 14:03:44] researcher agent finished
[2025-03-20 14:03:45] Summoning planner agent — Plan phase: Set up project
[2025-03-20 14:04:28] planner agent finished
[2025-03-20 14:04:28] Session ended
The hooks only match on the Task tool, so regular Bash and Read calls do not clutter the log. The logger also checks for the .planning/ directory before writing, which means it silently does nothing in projects that are not using the framework. Since the hooks are registered globally, this check keeps them from interfering elsewhere.
What I Took Away From Building This
The framework is around 400 lines of JavaScript, a few markdown command files, and three agent definitions. It is a proof of concept, but building it made several abstract ideas about agent architecture concrete.
Agents produce better output when they know the consumer’s constraints. Telling the researcher “the planner will turn your output into executable tasks” is a different prompt than “document the technical domain.” The output format, specificity, and tone all shift. This applies beyond LLMs: any pipeline where one stage produces input for the next benefits from making that relationship explicit.
Role boundaries are easier to enforce through capability restrictions than through instructions. You can tell an agent “do not write code” in its prompt, and it might comply. Or you can remove the Write tool, and the constraint becomes structural. The second approach fails loudly and immediately.
Finally, the coordination layer can be much simpler than you think. Files on disk are durable, inspectable, and require zero infrastructure. There is no hidden state, no message serialization, and the debug process is cat. For a system where agents run sequentially and produce discrete artifacts, a shared directory is sufficient.