No Vibes Allowed — Solving Hard Problems in Complex Codebases

Context Engineering Coding Agents Dex Horthy — HumanLayer AI Engineer · December 2025 ~28 min

Overview

Dex Horthy — founder of HumanLayer and author of the viral "12 Factor Agents" essay that coined the term "context engineering" — explains why AI coding tools fail in brownfield codebases and presents a battle-tested framework for fixing it. After eight weeks of intensive experimentation, his three-person team achieved 2–3x throughput using a Research → Plan → Implement workflow built entirely around context window management. This talk covers the theory (the Dumb Zone, stateless LLMs, trajectory traps) and the practice (compaction, sub-agents for context control, progressive onboarding, plans with code snippets, and mental alignment through plan review).

1 The Problem With AI Coding Today 0:00

Dex opens with a reference to a study of 100,000 developers across companies of all sizes presented at AI Engineer in June 2025. The findings were sobering:

Developers using AI coding tools ship more code, but much of it is rework — fixing "slop" from the week before.
AI works exceptionally well for greenfield projects — new dashboards, fresh React apps, weekend prototypes.
In brownfield codebases — 10-year-old Java monoliths, mature production systems — AI can make developers less productive.

"Too much slop. Tech debt factory. It's just not going to work for our codebase. Maybe someday when the models get better." — Common refrain from founders and senior engineers.

Dex's thesis: we don't need to wait for better models. The answer is context engineering — managing what goes into the LLM's context window to get the best output from today's models.

2 Context Engineering — From Skeptic to 2–3x Throughput 0:47

Dex admits he was initially unimpressed by Claude Code. But over eight weeks of intense experimentation, his three-person team achieved 2–3x more throughput — shipping so much code they had to fundamentally change how they collaborated.

The result went viral on Hacker News in September 2025. Thousands of developers grabbed their open-source Research → Plan → Implement prompt system from GitHub. The core goals:

AI that works in brownfield codebases — not just greenfield toys
Complex problem solving — not just simple changes
No slop — code quality that passes expert review
Mental alignment — keeping the entire team on the same page
Maximum token spend — offloading as much meaningful work to AI as possible

3 Advanced Context Engineering 4:43

Dex frames the spectrum of approaches from naive to advanced:

Level 0 — The Argue Loop: Ask the agent to do something. Tell it why it's wrong. Re-steer. Repeat until you run out of context, give up, or cry.

Level 1 — Fresh Context Windows: When a conversation goes off track, start a new context window. Same prompt, same task, fresh start with a note on what didn't work. The signal to restart? When Claude starts apologizing profusely.

Level 2 — Intentional Compaction: The real breakthrough. Periodically compress your existing context window into a markdown file. Review it, tag it, and when a new agent starts, it gets straight to work without re-doing all the exploration.

LLMs are stateless. Every turn could go hundreds of right or wrong directions. The ONLY thing that influences which path the model takes is what's currently in the conversation.

Four optimization axes for context windows:

Correctness — no incorrect information (the worst contaminant)
Completeness — no missing information the agent needs
Size — minimal noise and irrelevant content
Trajectory — the conversational pattern matters (mistake→yelling→mistake teaches the model to keep failing)

4 The Dumb Zone 7:07

Dex introduces his "very academic concept": The Dumb Zone.

In Claude's ~168,000 token context window, around the 40% mark you start seeing diminishing returns in output quality. The first 40% is the "smart zone" where high-quality reasoning happens. Everything after is increasingly degraded.

"If you have too many MCP tools loaded in your coding agent, you're doing all your work in the dumb zone and you'll never get good results."

Jeff Huntley's research on coding agents: "The more you use of the context window, the worse outcomes you'll get." This reframes the entire coding agent workflow: everything is about cleverly avoiding the dumb zone.

Sub-agents are for context control, not role play. Don't create front-end, backend, and QA sub-agents. Instead, fork a new context window when you need to explore a large codebase. The sub-agent does all the reading and returns a succinct summary. The parent agent reads one file and gets to work — without consuming smart-zone tokens on exploration.

5 Compaction 8:31

The main consumers of context window space:

File search and discovery — the agent hunting through the codebase
Code flow understanding — reading files to understand connections
File editing — the actual changes
Test and build output — often verbose and token-hungry
MCP tool output — especially tools dumping JSON with UUIDs

A good compaction captures: "This is exactly what we're working on. These are the exact files and line numbers that matter."

Compaction is the core mechanism for staying in the smart zone. You compress truth (research) and intent (plans) into concise markdown that gives fresh agents everything they need to start working immediately.

6 Research → Plan → Implement 10:55

The core workflow is structured around three phases of context management:

Phase 1: Research — Understand how the system works. Sub-agents explore the codebase and produce a compressed markdown document with the specific files, code flows, dependencies, and exact line numbers that matter. This is a compression of truth derived from actual code, not stale docs.

Phase 2: Planning — Takes research output plus the bug ticket/feature requirement and creates a detailed implementation plan with exact steps, file names, line numbers, actual code snippets, and how to test after every change.

"If you read one of these plans, you can see very easily how the dumbest model in the world is probably not going to screw this up."

Phase 3: Implementation — With a good plan, this is "the least exciting part." The agent executes the plan step by step. Context stays low because all exploration and decision-making happened earlier.

7 One-Shotting a Fix in a 300K LOC Rust Codebase 14:29

Dex battle-tested the workflow on his podcast with Vibv (CEO of Boundary ML, makers of BAML):

Test 1: One-shot fix to BAML's 300,000-line Rust codebase for a programming language. In 90 minutes they built research documents, compared plans with and without research. By Tuesday morning, the CTO confirmed: "Yeah, this looks good. We'll get it in the next release."

Test 2: A 7-hour Saturday session shipped 35,000 lines of code to BAML. Vibv estimated it represented 1–2 weeks of manual work.

Test 3: Removing Hadoop dependencies from Parquet Java. It did not go well — they threw everything out and went back to the whiteboard.

"Do not outsource the thinking. AI cannot replace thinking. It can only amplify the thinking you have done — or the lack of thinking you have done."

8 Semantic Diffusion & Spec-Driven Dev 17:15

Dex takes a firm stance: "spec-driven development" is broken — not the idea, but the phrase itself.

He invokes Martin Fowler's 2006 concept of semantic diffusion: a good term gets popular, everyone starts meaning it to mean different things, and it becomes useless. It already happened with "agent." Now it's happening with "spec-driven dev" — which variously means writing a better prompt, a PRD, verifiable feedback loops, treating code like assembly, using markdown files, or even library documentation.

"Spec-driven dev is overhyped. It's useless now. It's semantically diffused." — Focus on actual techniques (compaction, context management) rather than buzzwords.

9 Onboarding Agents — The Memento Problem 18:47

Dex references the movie Memento: a man wakes up with no memory and reads his own tattoos. Every new agent context is that man. If you don't onboard your agents, they will make things up.

Naive approach: A massive onboarding doc in the repo root. Problem: it either consumes all your smart-zone tokens or is too incomplete.

Better approach — Progressive disclosure: Shard onboarding context down the stack. Root-level file provides high-level context, then each directory adds deeper context relevant to that area. The agent pulls root context plus only the sub-context for its current task — leaving plenty of room in the smart zone.

The staleness problem: Between code, function names, comments, and documentation, the axis is "the amount of lies you can find." Documentation has the most lies, code has the least.

10 Internal Docs vs On-Demand Context 21:54

Instead of maintaining documentation, Dex's team prefers on-demand compressed context. Give the research phase a little steering: "We're working in this part of the codebase — SCM providers, Jira and Linear integration."

The research prompt launches sub-agents that take vertical slices through the codebase and build a snapshot of the actually-true, based-on-the-code-itself parts that matter.

"You're compressing truth, not maintaining documentation." — Generate fresh, code-based research each time rather than maintaining docs that become lies.

11 Mental Alignment Through Plans 23:42

"Does anyone know what code review is for?" The answer: mental alignment. Not just correctness — keeping everyone on the team on the same page about how the codebase is changing and why.

As teams ship 2–3x more code with AI, this becomes critical. Dex can't read thousands of lines of Go every week. But he can read the plans — enough to catch problems early and maintain understanding of how the system evolves.

Mitchell Hashimoto's approach: putting AMP threads directly on pull requests so reviewers see the exact steps, prompts, and evidence that the build passed. "This takes the reviewer on a journey in a way that a GitHub PR just can't."

12 Code Snippet Plans — Compression of Intent 25:18

Plans should include actual code snippets of what's going to change. The goal is threefold:

High confidence that the model will do the right thing
Compression of intent — a concise representation of what needs to happen
Reliable execution — so reliable even a weaker model could execute it

There's a sweet spot: as plans get longer, reliability goes up but readability goes down. Every team finds their own balance.

"This is NOT magic. There is no perfect prompt. The process will not work if you do not read the plan." — A bad line of research could derail everything. A bad part of a plan could be a hundred bad lines of code.

13 Cultural Change Is the Hard Part 27:22

Dex believes the coding agent techniques will be commoditized. The hard part is organizational and cultural transformation.

A growing rift in engineering orgs:

Staff engineers don't adopt AI — it doesn't make them much faster on already-efficient workflows
Junior/mid-level engineers use it heavily — it fills skill gaps but also produces slop
Senior engineers hate it more each week — they're cleaning up AI-generated slop

"This is not AI's fault. This is not the mid-level engineer's fault. Cultural change is really hard and it needs to come from the top if it's going to work."

Scaling framework for context engineering effort:

Trivial change (button color) → Just talk to the agent
Small feature → Simple plan, maybe skip research
Medium feature → One research phase, then plan
Hard, complex problems → Full Research → Plan → Implement with iterations

Parting advice: pick one tool and get some reps. Don't minmax across Claude Code, Codex, Cursor, and others. The transition to AI-assisted development is inevitable — the question is whether your team navigates it intentionally.

⚡ Key Takeaways

Context engineering > waiting for better models. You can solve hard problems in complex codebases today by managing your context window carefully.
The Dumb Zone is real. After ~40% of context window utilization, quality degrades. Build your workflow around staying in the smart zone.
Sub-agents are for context control, not role play. Use them to explore and compress, not to anthropomorphize job titles.
Frequent Intentional Compaction is the core technique: Research (compress truth) → Plan (compress intent) → Implement (execute with minimal context).
Do not outsource the thinking. AI amplifies thinking — both good and bad. Read the plans. Review the research.
On-demand context beats maintained documentation. Generate fresh, code-based research each time rather than maintaining docs that become lies.
Plans should include code snippets. The more specific your plan, the more reliable execution becomes.
Mental alignment is the point of code review. Use plans and AMP threads to keep your team aligned as AI-shipped code accelerates.
"Spec-driven dev" is semantically diffused. Focus on the actual techniques — compaction, context management — rather than buzzwords.
Cultural change is the real challenge. The tools will be commoditized; adapting your team and SDLC will not.

🕐 Timestamp Index

0:00 Intro

0:47 Context Engineering

4:43 Advanced Context

7:07 The Dumb Zone

8:31 Compaction

10:55 Research, Plan, Implement

14:29 One-shotting a fix in 300K LOC Rust

17:15 Semantic Diffusion & Spec Dev

18:47 Onboarding Agents

21:54 Internal Docs

23:42 Mental Alignment

25:18 Code Snippet Plans

27:22 Cultural Change