No Vibes Allowed — Solving Hard Problems in Complex Codebases — Dex Horthy, HumanLayer

No Vibes Allowed — Solving Hard Problems in Complex Codebases

Context Engineering Coding Agents Dex Horthy — HumanLayer AI Engineer · December 2025 ~28 min

Overview

Dex Horthy — founder of HumanLayer and author of the viral "12 Factor Agents" essay that coined the term "context engineering" — explains why AI coding tools fail in brownfield codebases and presents a battle-tested framework for fixing it. After eight weeks of intensive experimentation, his three-person team achieved 2–3x throughput using a Research → Plan → Implement workflow built entirely around context window management. This talk covers the theory (the Dumb Zone, stateless LLMs, trajectory traps) and the practice (compaction, sub-agents for context control, progressive onboarding, plans with code snippets, and mental alignment through plan review).

1 The Problem With AI Coding Today 0:00

Dex opens with a reference to a study of 100,000 developers across companies of all sizes presented at AI Engineer in June 2025. The findings were sobering:

"Too much slop. Tech debt factory. It's just not going to work for our codebase. Maybe someday when the models get better." — Common refrain from founders and senior engineers.

Dex's thesis: we don't need to wait for better models. The answer is context engineering — managing what goes into the LLM's context window to get the best output from today's models.

2 Context Engineering — From Skeptic to 2–3x Throughput 0:47

Dex admits he was initially unimpressed by Claude Code. But over eight weeks of intense experimentation, his three-person team achieved 2–3x more throughput — shipping so much code they had to fundamentally change how they collaborated.

The result went viral on Hacker News in September 2025. Thousands of developers grabbed their open-source Research → Plan → Implement prompt system from GitHub. The core goals:

  1. AI that works in brownfield codebases — not just greenfield toys
  2. Complex problem solving — not just simple changes
  3. No slop — code quality that passes expert review
  4. Mental alignment — keeping the entire team on the same page
  5. Maximum token spend — offloading as much meaningful work to AI as possible

3 Advanced Context Engineering 4:43

Dex frames the spectrum of approaches from naive to advanced:

Level 0 — The Argue Loop: Ask the agent to do something. Tell it why it's wrong. Re-steer. Repeat until you run out of context, give up, or cry.

Level 1 — Fresh Context Windows: When a conversation goes off track, start a new context window. Same prompt, same task, fresh start with a note on what didn't work. The signal to restart? When Claude starts apologizing profusely.

Level 2 — Intentional Compaction: The real breakthrough. Periodically compress your existing context window into a markdown file. Review it, tag it, and when a new agent starts, it gets straight to work without re-doing all the exploration.

LLMs are stateless. Every turn could go hundreds of right or wrong directions. The ONLY thing that influences which path the model takes is what's currently in the conversation.

Four optimization axes for context windows:

  1. Correctness — no incorrect information (the worst contaminant)
  2. Completeness — no missing information the agent needs
  3. Size — minimal noise and irrelevant content
  4. Trajectory — the conversational pattern matters (mistake→yelling→mistake teaches the model to keep failing)

4 The Dumb Zone 7:07

Dex introduces his "very academic concept": The Dumb Zone.

In Claude's ~168,000 token context window, around the 40% mark you start seeing diminishing returns in output quality. The first 40% is the "smart zone" where high-quality reasoning happens. Everything after is increasingly degraded.

"If you have too many MCP tools loaded in your coding agent, you're doing all your work in the dumb zone and you'll never get good results."

Jeff Huntley's research on coding agents: "The more you use of the context window, the worse outcomes you'll get." This reframes the entire coding agent workflow: everything is about cleverly avoiding the dumb zone.

Sub-agents are for context control, not role play. Don't create front-end, backend, and QA sub-agents. Instead, fork a new context window when you need to explore a large codebase. The sub-agent does all the reading and returns a succinct summary. The parent agent reads one file and gets to work — without consuming smart-zone tokens on exploration.

5 Compaction 8:31

The main consumers of context window space:

A good compaction captures: "This is exactly what we're working on. These are the exact files and line numbers that matter."

Compaction is the core mechanism for staying in the smart zone. You compress truth (research) and intent (plans) into concise markdown that gives fresh agents everything they need to start working immediately.

6 Research → Plan → Implement 10:55

The core workflow is structured around three phases of context management:

Phase 1: Research — Understand how the system works. Sub-agents explore the codebase and produce a compressed markdown document with the specific files, code flows, dependencies, and exact line numbers that matter. This is a compression of truth derived from actual code, not stale docs.

Phase 2: Planning — Takes research output plus the bug ticket/feature requirement and creates a detailed implementation plan with exact steps, file names, line numbers, actual code snippets, and how to test after every change.

"If you read one of these plans, you can see very easily how the dumbest model in the world is probably not going to screw this up."

Phase 3: Implementation — With a good plan, this is "the least exciting part." The agent executes the plan step by step. Context stays low because all exploration and decision-making happened earlier.

7 One-Shotting a Fix in a 300K LOC Rust Codebase 14:29

Dex battle-tested the workflow on his podcast with Vibv (CEO of Boundary ML, makers of BAML):

Test 1: One-shot fix to BAML's 300,000-line Rust codebase for a programming language. In 90 minutes they built research documents, compared plans with and without research. By Tuesday morning, the CTO confirmed: "Yeah, this looks good. We'll get it in the next release."

Test 2: A 7-hour Saturday session shipped 35,000 lines of code to BAML. Vibv estimated it represented 1–2 weeks of manual work.

Test 3: Removing Hadoop dependencies from Parquet Java. It did not go well — they threw everything out and went back to the whiteboard.

"Do not outsource the thinking. AI cannot replace thinking. It can only amplify the thinking you have done — or the lack of thinking you have done."

8 Semantic Diffusion & Spec-Driven Dev 17:15

Dex takes a firm stance: "spec-driven development" is broken — not the idea, but the phrase itself.

He invokes Martin Fowler's 2006 concept of semantic diffusion: a good term gets popular, everyone starts meaning it to mean different things, and it becomes useless. It already happened with "agent." Now it's happening with "spec-driven dev" — which variously means writing a better prompt, a PRD, verifiable feedback loops, treating code like assembly, using markdown files, or even library documentation.

"Spec-driven dev is overhyped. It's useless now. It's semantically diffused." — Focus on actual techniques (compaction, context management) rather than buzzwords.

9 Onboarding Agents — The Memento Problem 18:47

Dex references the movie Memento: a man wakes up with no memory and reads his own tattoos. Every new agent context is that man. If you don't onboard your agents, they will make things up.

Naive approach: A massive onboarding doc in the repo root. Problem: it either consumes all your smart-zone tokens or is too incomplete.

Better approach — Progressive disclosure: Shard onboarding context down the stack. Root-level file provides high-level context, then each directory adds deeper context relevant to that area. The agent pulls root context plus only the sub-context for its current task — leaving plenty of room in the smart zone.

The staleness problem: Between code, function names, comments, and documentation, the axis is "the amount of lies you can find." Documentation has the most lies, code has the least.

10 Internal Docs vs On-Demand Context 21:54

Instead of maintaining documentation, Dex's team prefers on-demand compressed context. Give the research phase a little steering: "We're working in this part of the codebase — SCM providers, Jira and Linear integration."

The research prompt launches sub-agents that take vertical slices through the codebase and build a snapshot of the actually-true, based-on-the-code-itself parts that matter.

"You're compressing truth, not maintaining documentation." — Generate fresh, code-based research each time rather than maintaining docs that become lies.

11 Mental Alignment Through Plans 23:42

"Does anyone know what code review is for?" The answer: mental alignment. Not just correctness — keeping everyone on the team on the same page about how the codebase is changing and why.

As teams ship 2–3x more code with AI, this becomes critical. Dex can't read thousands of lines of Go every week. But he can read the plans — enough to catch problems early and maintain understanding of how the system evolves.

Mitchell Hashimoto's approach: putting AMP threads directly on pull requests so reviewers see the exact steps, prompts, and evidence that the build passed. "This takes the reviewer on a journey in a way that a GitHub PR just can't."

12 Code Snippet Plans — Compression of Intent 25:18

Plans should include actual code snippets of what's going to change. The goal is threefold:

There's a sweet spot: as plans get longer, reliability goes up but readability goes down. Every team finds their own balance.

"This is NOT magic. There is no perfect prompt. The process will not work if you do not read the plan." — A bad line of research could derail everything. A bad part of a plan could be a hundred bad lines of code.

13 Cultural Change Is the Hard Part 27:22

Dex believes the coding agent techniques will be commoditized. The hard part is organizational and cultural transformation.

A growing rift in engineering orgs:

"This is not AI's fault. This is not the mid-level engineer's fault. Cultural change is really hard and it needs to come from the top if it's going to work."

Scaling framework for context engineering effort:

Parting advice: pick one tool and get some reps. Don't minmax across Claude Code, Codex, Cursor, and others. The transition to AI-assisted development is inevitable — the question is whether your team navigates it intentionally.

⚡ Key Takeaways

  1. Context engineering > waiting for better models. You can solve hard problems in complex codebases today by managing your context window carefully.
  2. The Dumb Zone is real. After ~40% of context window utilization, quality degrades. Build your workflow around staying in the smart zone.
  3. Sub-agents are for context control, not role play. Use them to explore and compress, not to anthropomorphize job titles.
  4. Frequent Intentional Compaction is the core technique: Research (compress truth) → Plan (compress intent) → Implement (execute with minimal context).
  5. Do not outsource the thinking. AI amplifies thinking — both good and bad. Read the plans. Review the research.
  6. On-demand context beats maintained documentation. Generate fresh, code-based research each time rather than maintaining docs that become lies.
  7. Plans should include code snippets. The more specific your plan, the more reliable execution becomes.
  8. Mental alignment is the point of code review. Use plans and AMP threads to keep your team aligned as AI-shipped code accelerates.
  9. "Spec-driven dev" is semantically diffused. Focus on the actual techniques — compaction, context management — rather than buzzwords.
  10. Cultural change is the real challenge. The tools will be commoditized; adapting your team and SDLC will not.

🕐 Timestamp Index

0:00 Intro
0:47 Context Engineering
4:43 Advanced Context
7:07 The Dumb Zone
8:31 Compaction
10:55 Research, Plan, Implement
14:29 One-shotting a fix in 300K LOC Rust
17:15 Semantic Diffusion & Spec Dev
18:47 Onboarding Agents
21:54 Internal Docs
23:42 Mental Alignment
25:18 Code Snippet Plans
27:22 Cultural Change