Matt Pocock walks through the 5 Claude Code skills he uses every single day to steer AI agents and dramatically improve code quality. The skills form a complete development pipeline: from grilling an idea into shared understanding, to writing a PRD, breaking it into vertical slices, enforcing TDD, and continuously improving codebase architecture. Each skill encodes a strict process so the AI has a well-defined path to follow — because agents have no memory, and process has never been more important.
Matt frames the core problem: developers now have access to a fleet of middling-to-good engineers deployable at any time. But these AI engineers have a critical flaw — they have no memory. They don't remember what they've done before. This means you need extremely strict, well-defined processes to get agents to produce useful work.
The solution: skills — small markdown files that encode your process so the AI has a strict path to walk down every time. Matt's skills repo contains everything he uses, and as a result of using them, the code quality AI produces has "shot up."
Key insight: Skills don't have to be long to be impactful. You just have to choose the right words for the LLM at the right time.
Matt's favorite skill — and it's only three sentences long:
"Interview me relentlessly about every aspect of this plan until we reach a shared understanding. Walk down each branch of the design tree, resolving dependencies between decisions one by one. If a question can be answered by exploring the codebase, explore the codebase instead."
The concept of the design tree comes from Frederick P. Brooks' The Design of Design. When approaching a design, you walk down all branches of a tree of decisions. For example: designing a search page → advanced search or text box? → if advanced, what filters? What sorting? You keep walking until the design is fully fleshed out before committing to code.
Matt invokes /grill-me when he wants to reach a shared understanding with the LLM. His complaint with Claude Code's default plan mode: it tends to spit out a plan too early, creating a document before he feels alignment has been reached. Grill Me forces the conversation.
Real example: Adding a feature to his course video editor codebase. After providing a research markdown file, he invoked Grill Me. The AI asked 16 questions — and that was a relatively short session. He's had sessions of 30–45 minutes with 30–50 questions on complex features.
Once Matt has reached shared understanding through grilling, he invokes the Write a PRD skill to convert that understanding into a durable document. The skill's process:
Ask for a long, detailed description of what the user wants
Explore the repo to verify the user's assertions
Interview relentlessly — a copy of the grill-me process (can be skipped if grilling already happened)
Sketch out major modules that need to be built or modified
Write the PRD using a template, submitted as a GitHub issue
Real example: A PRD for adding a split-pane document editing experience to an AI chat feature. The PRD included a problem statement, detailed solution description, and — critically — many user stories drawn from Agile methodology, describing desired behavior in language. The implementation decisions are kept deliberately non-prescriptive so the PRD stays durable even as code evolves.
The PRD describes the destination — but not the journey. It says where you're going, not how you'll get there. The next skill bridges that gap.
This skill takes the PRD (the destination) and turns it into a Kanban board of independent, grabbable issues. The process:
Locate the PRD — fetch it if not already in context
Explore the codebase if needed
Draft vertical slices — the critical step
Breaking a PRD into tasks is something developers have done for years and developed intuition for. Matt's principle: break it into tasks that flush out unknown unknowns quickly. If you're integrating with a new service or connecting two things that haven't been connected before, do that work first — it gives you feedback on whether your approach is even valid.
The tracer bullet analogy: each issue is a thin vertical slice that cuts through all integration layers, not a horizontal slice of one layer.
Real example: The complex document-editing PRD got broken into just 4 vertical slices:
Slice 1: The editing engine with tests — the foundation that powers everything else. If this doesn't work, everything fails, so flush it out first.
Slice 2: Not blocked by anything — can be picked up independently (great for parallel agents)
Slice 3: Blocked by Slice 1
Slice 4: Blocked by Slice 2
The skill also establishes blocking relationships between tasks — essential for parallel agent setups and for adding future QA issues. Each issue references the parent PRD so agents can fetch and read it. Matt's Ralph Loop then picks up each issue, implements it, comments on it, closes it, and unblocks the next one.
With a plan and issues in place, the question becomes: how do you make the implementation rock-solid? The answer is Test-Driven Development.
Unlike Matt's other skills, the TDD skill is unusually long — it includes philosophy on refactoring, mocking, and deep modules. The workflow:
Confirm with the user what interface changes are needed — the most important step. When an AI looks at a bad codebase (many tiny undifferentiated modules), it struggles to understand responsibilities, dependencies, and structure. Restructuring into larger modules with thin interfaces makes the codebase easier for AI to navigate and test.
Confirm which behaviors to test
Design interfaces for testability
Red-Green-Refactor loop: Write one failing test → write code to make it pass → look for refactor candidates → repeat until complete
Red-green-refactor with agents is incredible. Write a failing test, write code to pass it, then refactor. This has been the most consistent way Matt has improved agent output quality.
Caveat on refactoring: LLMs are quite reluctant to refactor their own code while it sits in their context window. They become "precious" about code they just wrote. Clearing the context would help, but within a session, don't expect aggressive refactoring.
Matt uses the TDD skill to prompt his Ralph Loops (autonomous agent loops that iterate through issues).
TDD demands a well-structured codebase — test boundaries need to be clear. This skill continuously improves the codebase to make it more amenable to AI-driven development. The process:
Explore the codebase naturally — surface what the AI finds confusing:
Where does understanding one concept require bouncing between many small files?
Where have pure functions been extracted just for testability, but the real bugs hide in how they're called?
Where do tightly coupled modules create integration risk in the seams between them?
Present candidates — a numbered list of "deepening opportunities" (chances to consolidate shallow modules into deeper ones)
User picks a candidate
Design multiple interfaces — spawn 3+ sub-agents in parallel, each producing a radically different interface for the deepened module. This generates diverse design options to compare.
Recommend and optionally hybridize — suggest the strongest design and propose a hybrid if elements from different designs combine well
Create a GitHub issue — a refactor RFC issue, which Matt then feeds back into /prd-to-issues for implementation planning
You don't need to know a lot about interface design to use this skill. It's language-agnostic and codebase-agnostic — just run it anywhere and get a decent answer for how things could be improved.
Recommended cadence: Run it once a week to identify opportunities, or after a surge of development when you've added a whole new wing of features. Over time, as you keep refining your codebase, the quality of agent output goes up — because a garbage codebase produces garbage AI output.
7 The Complete Pipeline
Together, the 5 skills form a coherent development workflow:
/grill-me → Flesh out an idea through relentless questioning (shared understanding)
/write-a-prd → Turn that understanding into a durable destination document
/prd-to-issues → Break the destination into a journey of vertical slices
/tdd → Execute each slice with red-green-refactor discipline
/improve-codebase-architecture → Continuously improve the codebase so all of the above works better
"If you took all of these skills and said 'this is a little mini markdown book of processes for humans,' it wouldn't look out of place. The most successful way to get code quality up from agents is to treat them like humans — humans with weird constraints, sure. Humans that have no memory and are cloned, come out of the birthing pod and go right to work."
🎯 Key Takeaways
Process has never been more important. AI agents have no memory — strict, well-defined processes are the only way to get consistent quality.
Skills don't have to be long. /grill-me is 3 sentences but forces 16–50 question deep-dive sessions that dramatically improve alignment.
Claude Code's plan mode is too eager. It wants to produce a document before shared understanding is reached — /grill-me forces the conversation first.
PRDs describe the destination, not the journey. Keep implementation details non-prescriptive so the PRD stays durable as code evolves.
Break PRDs into vertical slices, not horizontal layers. Each issue should be a thin tracer bullet through all integration layers, flushing out unknown unknowns first.
Blocking relationships enable parallel agents. Independent slices can be picked up by multiple agents simultaneously.
TDD is the most consistent way to improve agent output. Red-green-refactor forces small steps and catches problems early.
LLMs are reluctant to refactor their own code while it's in context — they become "precious" about what they just wrote.
Deep modules beat shallow modules — both for human understanding and AI navigation. Restructure into larger modules with thin interfaces.
Continuously improve your codebase architecture. A garbage codebase produces garbage AI output. Run /improve-codebase-architecture regularly.
Treat agents like humans with weird constraints. The best processes for AI agents are the same engineering processes that work for human teams.