Building Pi in a World of Slop — Deep Dive

🎤 AI Engineer (talk by Mario Zechner) ⏱ ~18:25 Watch on YouTube ↗

📋 Overview

Mario Zechner, creator of the Pi coding agent, delivers a conference talk in three acts: (1) why he stopped using Claude Code and built Pi, (2) how AI-generated spam ("clankers") is destroying open source, and (3) a passionate argument for slowing down and writing critical code by hand. A raw, opinionated, and deeply technical talk about agent harness design, context control, and the compounding error problem of agent-generated code.

Act One — Why He Built Pi ▶ 0:29

Started using Claude Code in April 2025. Initially simple, predictable, fit his workflow. But problems emerged:

Looked at alternatives: Amp and FactoryDroid ("the Porsche and Lamborghini"), OpenCode (brilliant team, but found issues: tool output pruning that "lobotomizes the model", LSP integration checking errors after every edit confusing the model, individual messages stored as separate JSON files, CORS security issue).

The Terminal Bench Revelation ▶ 4:46

Discovered Terminal Bench — a benchmark giving the model only keystroke-sending to a tmux session. No file tools, no sub-agents. Yet it scored among the highest, often beating native harnesses.

Two theses:

Pi's Architecture ▶ 5:46

Stripped everything down, built minimal but extensible core. Agent can modify itself.

Four packages: AI (provider abstraction), Agent Core (while loop + tool calling), TUI (bespoke framework from game dev background — doesn't flicker), Coding Agent.

Pi's system prompt: [shows nearly empty prompt]. "That's it." Models are reinforcement-trained on coding agents — they don't need 10,000 tokens telling them they're a coding agent. They know.

Four tools only: read, write, edit, bash. ▶ 7:01

YOLO by default — no approval dialogs. "My security needs are different than yours." Instead gives you rope to build your own security. ▶ 7:27

Sub-agents, plan mode, MCP — NOT built in. You ask Pi to build them as extensions based on your needs.

Extensions — Pi's Superpower ▶ 8:02

Extensions are TypeScript modules. Extension API hooks into everything: tools, slash commands, events, session state, custom compaction, custom providers, full tool control.

Packaged via NPM or GitHub — "we don't need another silo called a marketplace. We already have package managers."

Everything hot-reloads during sessions (game dev philosophy: low iteration time).

Examples ▶ 9:22

"How do you build a Pi extension? You don't. You tell Pi to build it for you." ▶ 10:01

Terminal Bench: Pi scored 6th place — before even having compaction.

Act Two — OSS in the Age of Clankers ▶ 10:57

"Clankers" (AI-generated spam) are destroying open source. TilDraw closed their issue tracker. OpenCode flooded. Pi's tracker filled with garbage from OpenCode instances using Pi as agent core without users knowing.

Mario's anti-clanker system ▶ 11:14

Act Three — Slow The F*** Down ▶ 12:04

The core argument: "Everything's broken."

"Our product's been 100% built by agents." — "Yes, we know it sucks now. Congratulations."

Agents compound errors ("booboos") with zero learning, no bottlenecks, and delayed pain (for you). Visualization: 1 human → manageable errors. 1 agent → more errors. 10 agents → exponential errors. ▶ 13:00

"But I have a review agent!" — "Let me introduce you to the wonderful world of the ouroboros." Doesn't work.

Why agents produce complexity ▶ 14:00

They learned from the internet (90% garbage code). Every decision is local. They add abstractions, duplication, backwards compatibility everywhere. "Enterprise-grade complexity within 2 weeks with just two humans and 10 agents."

"But my detailed spec." — "A sufficiently detailed spec is a program." Blanks in specs get filled with internet garbage.

Humans vs agents: humans are fallible BUT they learn, they're bottlenecks (limited booboos per day), and they feel pain. Pain triggers action (quit, blame someone, or refactor). "Agents will happily keep shitting into your code base."

How We Should Work ▶ 16:12

Good agent tasks:

Pattern: agent works → you evaluate → take what's reasonable (most isn't) → finalize.

Final advice ▶ 16:56

🔑 Key Takeaways

  1. Claude Code's biggest problem: you don't control your own context
  2. Terminal Bench proves minimal harnesses can outperform feature-rich ones
  3. Pi's philosophy: 4 tools, ~empty system prompt, self-modifying via extensions
  4. Models are already reinforcement-trained as coding agents — they don't need verbose instructions
  5. Extensions > built-in features: let Pi build what YOU need
  6. AI-generated PRs ("clankers") are destroying open source — require human vouching
  7. Agents compound errors with zero learning and no pain mechanism
  8. "A sufficiently detailed spec is a program" — blanks get filled with internet garbage
  9. Humans are bottlenecks and that's a FEATURE, not a bug
  10. Critical code: write by hand. The friction IS the learning.

🕐 Timestamp Index