Cole Medin breaks down harness engineering โ the 2026 evolution of context engineering โ explaining that it's not just about feeding context to an LLM, but about building the entire wrapper around the model: rules, skills, hooks, sub-agents, and orchestration layers. The video covers two key dimensions: the AI layer within a single coding agent session, and the more powerful practice of orchestrating multiple agent sessions into automated workflows (like the RALF loop). Most importantly, Cole argues that harness engineering is a mindset: every model mistake is an opportunity to improve your harness, not a reason to wait for the next model version.
Cole opens by noting that "harness engineering" is rapidly becoming the next big buzzword in the AI space for 2026, much like "context engineering" was for 2025. But just like its predecessor, people are already throwing the term around without truly understanding what it means.
The core question he poses: Is this skill โ or even mindset โ worth learning? His answer is an emphatic yes. The video promises a breakdown in under 15 minutes with concrete examples and demos to make it tangible.
Harness engineering is the next evolution of context engineering โ and it's already becoming a buzzword people use without understanding it.
At its core, harness engineering is about building the wrapper around the model. Any AI agent is the combination of two things:
The underlying LLM โ GPT, Claude, etc. โ providing the reasoning capability
The wrapper (harness) โ everything around the model that gives it context, defines processes, and provides capabilities
While the video focuses primarily on AI coding assistants, Cole emphasizes that the concept of harness engineering can be extrapolated to any agent you build for anything.
There are two distinct parts of harness engineering:
Within a single coding agent session โ the AI layer, rules, skills, and context you inject
Across multiple sessions โ orchestrating many coding agent sessions into a larger workflow to handle bigger tasks (the real evolution)
An agent = LLM + Harness. The harness is everything you build around the model โ and it's the part you actually control.
Cole presents a layered diagram that makes the architecture crystal clear. From inside out:
Layer 1 โ The LLM: The raw reasoning engine (GPT, Claude, etc.). This is the foundation but by itself it can't do much โ no file system access, no command execution, nothing practical.
Layer 2 โ The Coding Agent Tool: Claude Code, Codex, Pi, and others. These are harnesses that companies have already engineered around the model. You don't build this layer โ you pick it when you choose your tool. "Some people think Claude Code is the best harness for coding. Some people think Codex is."
Layer 3 โ Your AI Layer: This is the ultimate wrapper and the part you actually get to build. It defines all your context and processes for coding agents.
MCP servers โ external capabilities and integrations
Codebase searching โ LSP, knowledge graphs
Hooks โ pre/post tool-use automation
Sub-agents โ delegated specialized tasks
"We take for granted all of the capabilities that AI coding assistants give to the model out of the box. An LLM by itself doesn't have any way to access a file system or run any commands."
Cole addresses the elephant in the room: isn't this just context engineering? The answer is "yes, to an extent" โ and that's exactly why it's becoming a buzzword. Most people don't understand the true evolution.
He outlines two key distinctions, referencing a Martin Fowler article:
Most of the harness IS context engineering โ context injection, actions through tools and MCPs, persistence, observability. These haven't changed.
The new element is control โ RALF loops, orchestrating different coding agent sessions, and sub-agents. This is a true evolution beyond what "context engineering" covered.
The article breaks the harness down into: context injection, actions (tools, MCPs), persistence, observability, and control. The first four are essentially context engineering. The fifth โ control โ is what makes harness engineering genuinely new.
Context engineering โ Harness engineering. The "control" layer โ orchestrating sessions and RALF loops โ is the true evolution.
Beyond being a skill, harness engineering is fundamentally a mindset reframe. Cole quotes from an article by Addy Osmani:
"There's a pattern I watch engineers fall into. The agent does something dumb, the engineer blames the model, and the blame gets filed under 'wait for the next version.'"
This is the anti-pattern: Claude messes up โ "let's wait for Opus 5." GPT fails โ "let's wait for GPT-6." Cole admits he's tempted to think this way too. But the harness engineering mindset rejects that default.
Instead, the approach is what Cole calls "system evolution":
The agent didn't know about a convention? โ Add it to agents.md
The agent ran a destructive command? โ Add a hook that blocks it
Any mistake? โ An opportunity to improve your harness
This creates a virtuous cycle: you become the human steering the system, feeding forward with principles and context for generation, and sensors (hooks, review agents, skills) for feedback and self-correction โ evolving your AI layer over time.
"Every mistake becomes an opportunity to improve your harness. You're taking ownership and improving the performance of your coding agent over time with the AI layer that you control."
6 Components of the AI Layer โ Companion Repo โถ 9:25
Cole walks through his companion repo (harness-engineering-demo) which provides a concrete, reusable AI layer template. Here are the key components:
Rules (Foundation):
Global rules define the constraints, conventions, and patterns your coding agent must follow
Also includes on-demand context as markdown/Confluence documents
Skills (Workflows):
Separate skills for plan, implement, and validate
Each skill runs in a separate coding agent session for token efficiency and focus
Each skill outputs an artifact (e.g., markdown plan document) that serves as a handoff to the next session
Hooks (Underused Power):
Pre-tool-use security hook โ triggers before any tool call (file write, command execution) to block destructive operations like reading sensitive files or running rm -rf
Stop validation hook โ when the agent says "I'm done," deterministically runs unit tests, linting, and type checking. If anything fails, it forces the agent to iterate until everything passes
Post-edit lint hook โ runs a quick lint after every single file edit to keep the codebase clean, which also makes future agent sessions more reliable
The repo also includes instructions for running a basic PIT (Plan โ Implement โ Test) workflow manually: plan with the plan skill, iterate, produce a markdown document, hand it off to the implement skill in a separate session.
"Hooks are honestly pretty underused. I love using hooks for security (pre-tool-use), stop validation (force tests to pass), and post-edit linting."
This is what Cole calls the "peak evolution of harness engineering" โ the real power move. The core idea:
Don't hand a massive task or PRD to a single coding agent session โ it won't be token efficient and the LLM will be completely overwhelmed
No matter how good your AI layer is, if you send too much into the LLM at once, "it is going to fall flat on its face"
Instead, give each coding agent a very focused task
The example harness workflow Cole describes:
Explore โ one agent explores the implementation from a user requirement
Plan โ one agent writes the plan
Implement โ one agent handles the implementation
Review (parallel) โ multiple review agents run simultaneously, each with a different focus:
Security review agent
Correctness review agent
Simplicity review agent
Decision gate โ if all reviews pass โ create the PR. Otherwise โ iterate back to implementation.
You can do all of this manually (opening separate Claude Code sessions, copying plan documents between them), but the real power of harness engineering is automating the entire pipeline.
"If you send too much into the LLM at once, it is going to fall flat on its face. Give each coding agent a very focused task."
The RALF loop (created by Jeffrey Huntley) is one of the first and most influential examples of an agent harness that automates multi-session orchestration. Cole walks through how it works:
How it works:
A simple script (Python or Bash) takes a larger scope of work (e.g., a massive PRD) as input
It splits that scope into individual tasks
It runs coding agent sessions (Claude Code, Codex, etc.) to handle them one at a time until everything is done
The loop mechanics:
You provide a prompt with the spec items you want built
The system produces a fix plan โ what to do in iteration 1, iteration 2, iteration 3, etc.
It builds up a log as different Claude Code sessions run
When the system decides it's done, it produces an indicator (e.g., a done.txt file)
The exit condition: the main while loop only exits when done.txt exists AND the coding agent is confident the implementation is complete with all validation passing
Cole also mentions Archon, his open-source harness builder, as the easiest way to get started building custom harnesses tailored to your exact process and software development lifecycle.
"We are using many coding agent sessions to keep each one very focused, but also we're automating it so we don't have to babysit our coding agent. This really is the future of agentic engineering."
๐ฏ Key Takeaways
Harness = Everything Around the Model โ Any agent is the combination of the LLM (reasoning) plus the harness (context, tools, processes). The harness is the part that matters most and the part you control.
Three Layers of AI Agents โ The LLM at the core, the coding agent tool (Claude Code, Codex) as the first harness, and your AI layer as the ultimate wrapper that you build and evolve.
Six AI Layer Components โ Global rules, skills, MCP servers, codebase searching (LSP/knowledge graphs), hooks, and sub-agents. Every process you inject goes through one of these six channels.
Context Engineering โ Harness Engineering โ Most of the harness IS context engineering. The new element is "control" โ orchestrating sessions, RALF loops, and sub-agents.
Mindset Over Skill โ The harness engineering mindset rejects "blame the model, wait for the next version." Instead, every mistake becomes an opportunity to improve your harness.
System Evolution โ Treat your AI layer as a living system. Convention missed? Add it to rules. Destructive command? Add a hook. Each session should leave your harness better than before.
Separate Sessions for Plan, Implement, Validate โ Keep each coding agent session focused and token-efficient by using separate skills/sessions for planning, implementation, and validation, with artifact handoffs between them.
Hooks Are Underused โ Pre-tool-use security hooks, stop validation hooks (force tests to pass before "done"), and post-edit lint hooks are powerful but most engineers don't use them enough.
Don't Overwhelm a Single Session โ Never hand a massive PRD to one agent session. No matter how good your AI layer is, too much context will make the LLM "fall flat on its face."
Orchestrate Multiple Sessions โ The peak of harness engineering: automate pipelines where focused agent sessions handle explore โ plan โ implement โ review โ PR creation.
The RALF Loop Pattern โ A simple script that splits a large scope of work into tasks, runs coding agent sessions iteratively, and only exits when all specs are met and validation passes.
Building Harnesses Is the Future โ As models and tools get more powerful, the competitive advantage shifts to who builds the best harness โ the orchestration layer that makes agents reliably tackle larger scopes of work.
Martin Fowler โ Harness Engineering โ Article breaking down the five components of the harness: context injection, actions, persistence, observability, and control