Building Pi in a World of Slop

📋 Overview

Mario Zechner, creator of the Pi coding agent, delivers a conference talk in three acts: (1) why he stopped using Claude Code and built Pi, (2) how AI-generated spam ("clankers") is destroying open source, and (3) a passionate argument for slowing down and writing critical code by hand. A raw, opinionated, and deeply technical talk about agent harness design, context control, and the compounding error problem of agent-generated code.

Act One — Why He Built Pi ▶ 0:29

Started using Claude Code in April 2025. Initially simple, predictable, fit his workflow. But problems emerged:

Token madness — team grew, shipped many features he didn't need. With velocity came bugs. "If my hammer breaks every day, I'm getting really mad."
Context wasn't his context — Claude Code controls context behind your back: system prompt changes on every release, tools removed/modified without notice, system reminders injected at inopportune places ("may or may not be relevant"), confusing the model and breaking workflows.
Zero observability — can't see what agents are doing.
Zero model choice — locked to Anthropic models.
Zero extensibility — hooks are shallow, each spawns a new process.

Looked at alternatives: Amp and FactoryDroid ("the Porsche and Lamborghini"), OpenCode (brilliant team, but found issues: tool output pruning that "lobotomizes the model", LSP integration checking errors after every edit confusing the model, individual messages stored as separate JSON files, CORS security issue).

The Terminal Bench Revelation ▶ 4:46

Discovered Terminal Bench — a benchmark giving the model only keystroke-sending to a tmux session. No file tools, no sub-agents. Yet it scored among the highest, often beating native harnesses.

Two theses:

We're in the "fuck around and find out" phase — current agent form is NOT the final form
We need better ways to experiment — self-modifying, malleable agents

Pi's Architecture ▶ 5:46

Stripped everything down, built minimal but extensible core. Agent can modify itself.

Four packages: AI (provider abstraction), Agent Core (while loop + tool calling), TUI (bespoke framework from game dev background — doesn't flicker), Coding Agent.

Pi's system prompt: [shows nearly empty prompt]. "That's it." Models are reinforcement-trained on coding agents — they don't need 10,000 tokens telling them they're a coding agent. They know.

Four tools only: read, write, edit, bash. ▶ 7:01

YOLO by default — no approval dialogs. "My security needs are different than yours." Instead gives you rope to build your own security. ▶ 7:27

Sub-agents, plan mode, MCP — NOT built in. You ask Pi to build them as extensions based on your needs.

Extensions — Pi's Superpower ▶ 8:02

Extensions are TypeScript modules. Extension API hooks into everything: tools, slash commands, events, session state, custom compaction, custom providers, full tool control.

Packaged via NPM or GitHub — "we don't need another silo called a marketplace. We already have package managers."

Everything hot-reloads during sessions (game dev philosophy: low iteration time).

Examples ▶ 9:22

Anthropic's /byTheWay feature — someone rebuilt it in 5 minutes as a Pi extension with more features
Nico's chat room for Pi agents talking to each other (custom UI)
NES games and Doom running inside Pi

"How do you build a Pi extension? You don't. You tell Pi to build it for you." ▶ 10:01

Terminal Bench: Pi scored 6th place — before even having compaction.

Act Two — OSS in the Age of Clankers ▶ 10:57

"Clankers" (AI-generated spam) are destroying open source. TilDraw closed their issue tracker. OpenCode flooded. Pi's tracker filled with garbage from OpenCode instances using Pi as agent core without users knowing.

Mario's anti-clanker system ▶ 11:14

Auto-close all PRs with a comment asking for a human-written issue (max one screen of text)
If human responds → gets "vouched", future PRs allowed through
Clankers never read the comment, never come back — perfect filter
Labels deprioritizing issues from OpenCode interactions
3D embedding visualization of issues to see clusters
"OS certification" — just closes the tracker whenever he wants his life back

Act Three — Slow The F*** Down ▶ 12:04

The core argument: "Everything's broken."

"Our product's been 100% built by agents." — "Yes, we know it sucks now. Congratulations."

Agents compound errors ("booboos") with zero learning, no bottlenecks, and delayed pain (for you). Visualization: 1 human → manageable errors. 1 agent → more errors. 10 agents → exponential errors. ▶ 13:00

"But I have a review agent!" — "Let me introduce you to the wonderful world of the ouroboros." Doesn't work.

Why agents produce complexity ▶ 14:00

They learned from the internet (90% garbage code). Every decision is local. They add abstractions, duplication, backwards compatibility everywhere. "Enterprise-grade complexity within 2 weeks with just two humans and 10 agents."

"But my detailed spec." — "A sufficiently detailed spec is a program." Blanks in specs get filled with internet garbage.

Humans vs agents: humans are fallible BUT they learn, they're bottlenecks (limited booboos per day), and they feel pain. Pain triggers action (quit, blame someone, or refactor). "Agents will happily keep shitting into your code base."

How We Should Work ▶ 16:12

Good agent tasks:

Scopeable — agent guaranteed to find everything needed
Evaluable — function to measure quality
Non-mission-critical
Boring stuff, reproduction cases for user issues, rubber ducking

Pattern: agent works → you evaluate → take what's reasonable (most isn't) → finalize.

Final advice ▶ 16:56

Think about what you're building and WHY
Learn to say no — "your most valuable capability right now"
Fewer features, but polish them
Cap the amount of generated code you review
Non-critical code: "five slop ahead." Critical code: "read every line."
If you do anything important, write it by hand
"That friction builds the understanding of the system in your head"
"All of this still requires humans."

🔑 Key Takeaways

Claude Code's biggest problem: you don't control your own context
Terminal Bench proves minimal harnesses can outperform feature-rich ones
Pi's philosophy: 4 tools, ~empty system prompt, self-modifying via extensions
Models are already reinforcement-trained as coding agents — they don't need verbose instructions
Extensions > built-in features: let Pi build what YOU need
AI-generated PRs ("clankers") are destroying open source — require human vouching
Agents compound errors with zero learning and no pain mechanism
"A sufficiently detailed spec is a program" — blanks get filled with internet garbage
Humans are bottlenecks and that's a FEATURE, not a bug
Critical code: write by hand. The friction IS the learning.

🕐 Timestamp Index

0:29 Act One: why he stopped using Claude Code 1:01 Claude Code's context control problems 2:35 Zero observability, model choice, extensibility 3:09 Alternatives: Amp, FactoryDroid, OpenCode 4:46 Terminal Bench revelation 5:46 Pi's architecture: 4 packages, minimal prompt 7:01 Four tools: read, write, edit, bash 7:27 YOLO security model 8:02 Extensions: TypeScript modules, hot reload 9:22 Examples: /byTheWay, chat rooms, Doom 10:01 "You tell Pi to build it for you" 10:57 Act Two: clankers destroying OSS 11:14 Anti-clanker vouching system 12:04 Act Three: slow down, everything's broken 13:00 Compounding booboos visualization 14:00 Why agents produce complexity 16:12 How we should work: good agent tasks 16:56 Final advice: say no, read the code, write by hand

Building Pi in a World of Slop — Deep Dive