What Is an Agent Harness? Why the Harness, Not the Model, Is the 2026 Moat

An agent harness is everything in an AI agent system except the model itself: the loop that calls the model, the tools it is allowed to use, the memory it keeps between sessions, the permission rules, and the recovery logic when something breaks. The simplest way to remember it is the formula Agent = Model + Harness. In 2026 the harness is the part that decides whether your agent actually works, because the model has become a swappable component.

設定をやめて、構築を始めよう。

AIオーケストレーション付きSaaSビルダーテンプレート。

The proof: 98.4% of Claude Code is harness

When the source of Anthropic's Claude Code was reportedly exposed through a published npm package in March 2026, researchers who read it found something blunt. Out of about 512,000 lines of code, only 1.6% was the logic that talks to the AI model. The other 98.4% was harness: tool handling, file reading, permission checks, error recovery, and context management (reported figures from the leaked package).

Read that again. The famous part, the model, is a thin slice. The thing people actually paid for, the part that makes the agent reliable, is the wrapper around it.

That is the whole argument in one number. The model is a commodity you plug in. The harness is the product.

What "harness" means, in plain words

Two terms get mixed up, so here is the clean split (the consensus used by writers like Birgitta Böckeler at Thoughtworks and the Hugging Face team):

Harness = the execution layer. It is the machinery that runs the agent. The control loop (the repeating cycle of "ask the model, do what it says, feed the result back"), the tool dispatch, and the stop conditions that decide when the agent is done.
Scaffold = the behavior layer. It is what shapes how the agent thinks. The system prompt, the descriptions of each tool, and the context you feed in.

A quick analogy. The model is a very smart new employee. The scaffold is their job description and training. The harness is the office, the keys to the building, the approval forms, and the manager who checks their work before it ships. A genius with no office, no keys, and no review process does not get much done.

Why the harness beats the model on performance

This is not a style opinion. The benchmark numbers are large.

The same model scored 46% on one harness and 80% on another, a 34-point swing with zero change to the model (Cursor research).
On SWE-bench Pro, a coding benchmark, harness choices alone moved scores by 10 to 20 points.
Changing only the harness format improved 15 different language models by 5 to 14 points each, while cutting output tokens by about 20% (so it got better and cheaper at once).

Put simply: picking a better harness can help you more than upgrading to a smarter frontier model. Here is where performance actually comes from.

Factor	Controlled by Model	Controlled by Harness
Benchmark score swings (10 to 34 points)	No	Yes, harness choice
Memory across sessions	No	Yes, initializer plus checkpoint design
Tool access and permissions	No	Yes, MCP servers, hooks
Error recovery	No	Yes, quality gates, build fixers
Switching cost and lock-in	Low, swap the model easily	High, the harness holds accumulated work
Share of the Claude Code codebase	1.6%	98.4%

The four levers you control in a Claude Code harness

If you run agents on Claude Code, you tune the harness with four things. Each one is plain to use.

MCP servers. MCP (Model Context Protocol) is a standard way to plug external tools into the agent, like a database, your GitHub, or a payment system. See MCP servers for how this wiring works.
Skills. Packaged instructions for a specific job that get injected only when relevant, so the agent knows your domain without bloating every prompt.
Hooks. Scripts that fire at set moments in the agent's life. Claude Code exposes 27 lifecycle events, from PreToolUse (just before a tool runs) to SubagentStop (when a helper agent finishes). Hooks let you block, check, or log actions automatically.
The CLAUDE.md hierarchy. A plain text file of rules the agent reads every session. This is the cheapest form of harness work. Each line should trace back to a real failure you prevented. See CLAUDE.md for the pattern, and Claude Code subagents for splitting work across helpers.

Three ways to get a harness

You have three honest options, with real tradeoffs.

Build your own. This is the Mitchell Hashimoto approach, the developer who coined "harness engineering" in February 2026. It works, but it takes months of watching failures and patching them one by one. Full control, slow start.
Adopt a closed platform harness. Tools like LangChain Deep Agents, Codex, or raw Claude Code give you good performance fast. The catch: you do not own the harness, your memory often lives inside someone else's system, and switching later is painful.
Buy a harness you own outright. You get a working, domain-specific harness as code in your own repo. You can read it, change it, and keep it. This is the gap the $29 Code Kit fills: a complete build system for Claude Code, with agents, skills, hooks, and a production SaaS skeleton (auth, Stripe payments, PostgreSQL with row-level security on every table) already wired together.

Memory is the hidden lock-in

LangChain's April 2026 post "Your Harness, Your Memory" made a sharp point: whoever controls the harness controls the agent's memory. A closed harness is a switching-cost trap, because all the behavior you taught it stays behind when you leave.

An owned harness is the opposite. Your CLAUDE.md rules, your hooks, and your quality gates are files you keep. They are portable, and they compound. Every fix you add makes next month's agent better, and none of it is hostage to a vendor.

Harness engineering is a loop, not a launch

Harness engineering is not a one-time setup. The method, in Hashimoto's framing, is a simple loop: watch the agent fail, build a structural fix so that exact failure cannot happen again, repeat. Never let the same mistake twice.

In practice that means CLAUDE.md files are living documents. Quality gates (automatic checks like type-check, lint, and build) act as sensors, in Böckeler's framing, that catch a bad change before it ships. Every rule you add is compounding intellectual property. The model stays the same. Your harness keeps getting smarter.

FAQ

What is an agent harness?

An agent harness is everything in an AI agent system except the model itself: the control loop that calls the model, the tools it can use, the memory it carries between sessions, the permission rules, the context management, and the recovery logic when something fails.

Does the model or the harness matter more for AI agent performance?

The harness. Benchmark research shows the same model can score 46% on a weak harness and 80% on a strong one, a 34-point swing with no change to the model. Harness quality now produces bigger performance differences than switching between frontier models.

What is harness engineering?

Harness engineering is the practice of watching agent failures and building structural fixes so the same mistake cannot recur. Mitchell Hashimoto coined the term in February 2026. In practice it means iteratively improving CLAUDE.md files, hooks, quality gates, and tool configurations, not fine-tuning the model.

What is the difference between an agent scaffold and an agent harness?

A scaffold is the behavior layer: system prompts, tool descriptions, and context rules. A harness is the execution layer: the loop that calls the model, handles tool outputs, and decides when to stop. The scaffold tells the agent how to think. The harness runs the agent and controls what it can do.

What Is an Agent Harness? Why the Harness, Not the Model, Is the 2026 Moat

On this page