The Lossy Handoff: Why Agents Re-Implement Proven Code Badly

Explorer agents summarize for comprehension, not reproduction — so the writing agent rebuilds proven spike code from a compression artifact and reintroduces bugs the spikes had already solved. The fix was a skill that forces reading the source directly.

2026-03-30 // RAW LEARNING CAPTURE

PROJECTCLAUDE-PLUGINS

I ran three parallel spikes to prove out three separate bodies of work, then went to combine them into one cohesive prototype. Claude sent Explorer agents to understand each spike, came back with summaries, and wrote the combined implementation from those. The result was broken — and not subtly. It was broken in ways the spikes had already solved. The working path existed, in proven code, and Claude was re-implementing it from a description instead of referencing the thing that worked.

Why the handoff loses fidelity

The failure has a specific shape:

An Explorer agent reads the spike code and returns a prose summary.
The summary is optimized for comprehension — "here's how the auth system works."
The writing agent treats that summary as ground truth.
It re-implements the work from a lossy description instead of the proven source.
With three spikes fused in one shot, there are too many variables left to debug when it breaks.

This isn't a model-capability problem. Even an Opus-level Explorer is making editorial choices about what to include, and those choices serve understanding, not reproduction. A detail that reads as incidental to the Explorer — "oh, the token refresh uses a 30-second buffer" — can be load-bearing for the implementation. Summarizing for comprehension and writing from a reference are different purposes, and a summary built for one doesn't serve the other.

There's a second, sneakier risk. Explorer agents have no guidance on response format. If one thinks it understands the goal, it comes back prescriptive — "implement it this way." That lands in the writing agent's context as authoritative-sounding guidance, indistinguishable from factual reporting, and gets taken as gospel.

The fix: a /reference skill

I built a skill at ~/Code/claude-plugins/reference/SKILL.md that enforces one discipline — the writing agent must read source files directly before writing code that adapts them.

The protocol:

Identify the references — which files, spikes, branches.
Map with Explorer agents if needed — but prompt for inventory (paths, line ranges, signatures), not comprehension or recommendations.
Read source directly — load the actual code into context.
Plan with source visible — integration planning happens only after reading.
Write — with references available for cross-checking.

A feedback memory acts as a tripwire: when Claude notices it's about to write from agent summaries of proven source, it flags that and invokes the skill. During the testing phase every activation is visible, so I can judge whether it's helping or just adding overhead.

Why a skill and not a memory

A memory is opaque — I can't see when it fires or how it shifts behavior. A skill invocation shows up in the conversation. I see /reference fire and I can judge it. I also deliberately didn't try to change Explorer behavior globally: I don't fully understand its system prompt, Claude Code's internals will drift, and hard-coded prompting guidance rots. A scoped skill can be tuned or pulled.

The durable lesson is about where fidelity dies. The Explorer's job is to hand you a map so you can navigate. The writing agent's job is to read the territory before it builds on it. The boundary between "I know what exists" and "I'm about to build from it" is exactly where a summary stops being enough — and that applies to any orchestration pattern where one agent explores and another acts.

Committed as 4fea1aa in claude-plugins.

LOG.ENTRY_END

ref:claude-plugins

RAW