Build This Now
Build This Now
リアルなビルド事例State of Claude Code 2026: What 2,500 Public Repos Revealもうボトルネックは「作ること」じゃない新しい堀はディストリビューションだAI開発の本当のボトルネックがQAである理由24時間でMVPが作れる時代の第一原理思考自律性のカーブ:AI エージェントにどこまで自由を渡せるのかアイデアからSaaSへGANループ自己進化するフックトレースからスキルへ配信エージェントAI セキュリティエージェント自律型 AI スウォームAIメールシーケンスAIが自分自身を掃除するAgent Swarm OrchestrationClaude Codeでフルアプリを作る:実際の例非開発者のためのClaude Code:実際の使用例Claude Code for Freelancers: Ship 3x FasterA Security Update from Build This NowThe AI Agent That Deleted a Production Database in 9 SecondsHow to Build Your Own Claude Code Harness (or Buy One)Run Claude Code on a Cheaper Model: DeepSeek and GLM Cost ArbitrageIs Claude Code Just a Thin Wrapper? Inside the Harness DebateHow Much Does It Really Cost to Build a SaaS with Claude Code?How to Cut Your Claude Code Token Bill in HalfDo I Still Need a Boilerplate If I Use Claude Code?Harness vs Boilerplate vs Framework: The Build-System Stack ExplainedHow Long Does Idea to Production Actually Take with Claude Code?Is Vibe Coding Safe? What the Lovable and Moltbook Breaches TeachOwn Your Vercel Analytics: I Built a Drain-to-Postgres PipelineSpec-Driven Development Explained: Why Pros Stopped Vibe CodingState of Vibe-Coded SaaS Security (2026 Data)From Vibe Coding to Production: The Checklist That Stops Data LeaksVibe Coding vs Vibe Engineering vs Agentic Engineering: The 2026 GlossaryWhat Is an Agent Harness? Why the Harness, Not the Model, Is the 2026 Moat
speedy_devvkoen_salo
Blog/Real Builds/What Is an Agent Harness? Why the Harness, Not the Model, Is the 2026 Moat

What Is an Agent Harness? Why the Harness, Not the Model, Is the 2026 Moat

An agent harness is everything around the model: tools, memory, permissions, and the control loop. Here is why the harness is the real 2026 moat.

設定をやめて、構築を始めよう。

AIオーケストレーション付きSaaSビルダーテンプレート。

Published Jun 26, 20267 min readReal Builds hub

An agent harness is everything in an AI agent system except the model itself: the loop that calls the model, the tools it is allowed to use, the memory it keeps between sessions, the permission rules, and the recovery logic when something breaks. The simplest way to remember it is the formula Agent = Model + Harness. In 2026 the harness is the part that decides whether your agent actually works, because the model has become a swappable component.


設定をやめて、構築を始めよう。

AIオーケストレーション付きSaaSビルダーテンプレート。


The proof: 98.4% of Claude Code is harness

When the source of Anthropic's Claude Code was reportedly exposed through a published npm package in March 2026, researchers who read it found something blunt. Out of about 512,000 lines of code, only 1.6% was the logic that talks to the AI model. The other 98.4% was harness: tool handling, file reading, permission checks, error recovery, and context management (reported figures from the leaked package).

Read that again. The famous part, the model, is a thin slice. The thing people actually paid for, the part that makes the agent reliable, is the wrapper around it.

That is the whole argument in one number. The model is a commodity you plug in. The harness is the product.

What "harness" means, in plain words

Two terms get mixed up, so here is the clean split (the consensus used by writers like Birgitta Böckeler at Thoughtworks and the Hugging Face team):

  • Harness = the execution layer. It is the machinery that runs the agent. The control loop (the repeating cycle of "ask the model, do what it says, feed the result back"), the tool dispatch, and the stop conditions that decide when the agent is done.
  • Scaffold = the behavior layer. It is what shapes how the agent thinks. The system prompt, the descriptions of each tool, and the context you feed in.

A quick analogy. The model is a very smart new employee. The scaffold is their job description and training. The harness is the office, the keys to the building, the approval forms, and the manager who checks their work before it ships. A genius with no office, no keys, and no review process does not get much done.

Why the harness beats the model on performance

This is not a style opinion. The benchmark numbers are large.

  • The same model scored 46% on one harness and 80% on another, a 34-point swing with zero change to the model (Cursor research).
  • On SWE-bench Pro, a coding benchmark, harness choices alone moved scores by 10 to 20 points.
  • Changing only the harness format improved 15 different language models by 5 to 14 points each, while cutting output tokens by about 20% (so it got better and cheaper at once).

Put simply: picking a better harness can help you more than upgrading to a smarter frontier model. Here is where performance actually comes from.

FactorControlled by ModelControlled by Harness
Benchmark score swings (10 to 34 points)NoYes, harness choice
Memory across sessionsNoYes, initializer plus checkpoint design
Tool access and permissionsNoYes, MCP servers, hooks
Error recoveryNoYes, quality gates, build fixers
Switching cost and lock-inLow, swap the model easilyHigh, the harness holds accumulated work
Share of the Claude Code codebase1.6%98.4%

The four levers you control in a Claude Code harness

If you run agents on Claude Code, you tune the harness with four things. Each one is plain to use.

  1. MCP servers. MCP (Model Context Protocol) is a standard way to plug external tools into the agent, like a database, your GitHub, or a payment system. See MCP servers for how this wiring works.
  2. Skills. Packaged instructions for a specific job that get injected only when relevant, so the agent knows your domain without bloating every prompt.
  3. Hooks. Scripts that fire at set moments in the agent's life. Claude Code exposes 27 lifecycle events, from PreToolUse (just before a tool runs) to SubagentStop (when a helper agent finishes). Hooks let you block, check, or log actions automatically.
  4. The CLAUDE.md hierarchy. A plain text file of rules the agent reads every session. This is the cheapest form of harness work. Each line should trace back to a real failure you prevented. See CLAUDE.md for the pattern, and Claude Code subagents for splitting work across helpers.

Three ways to get a harness

You have three honest options, with real tradeoffs.

  1. Build your own. This is the Mitchell Hashimoto approach, the developer who coined "harness engineering" in February 2026. It works, but it takes months of watching failures and patching them one by one. Full control, slow start.
  2. Adopt a closed platform harness. Tools like LangChain Deep Agents, Codex, or raw Claude Code give you good performance fast. The catch: you do not own the harness, your memory often lives inside someone else's system, and switching later is painful.
  3. Buy a harness you own outright. You get a working, domain-specific harness as code in your own repo. You can read it, change it, and keep it. This is the gap the $29 Code Kit fills: a complete build system for Claude Code, with agents, skills, hooks, and a production SaaS skeleton (auth, Stripe payments, PostgreSQL with row-level security on every table) already wired together.

Memory is the hidden lock-in

LangChain's April 2026 post "Your Harness, Your Memory" made a sharp point: whoever controls the harness controls the agent's memory. A closed harness is a switching-cost trap, because all the behavior you taught it stays behind when you leave.

An owned harness is the opposite. Your CLAUDE.md rules, your hooks, and your quality gates are files you keep. They are portable, and they compound. Every fix you add makes next month's agent better, and none of it is hostage to a vendor.

Harness engineering is a loop, not a launch

Harness engineering is not a one-time setup. The method, in Hashimoto's framing, is a simple loop: watch the agent fail, build a structural fix so that exact failure cannot happen again, repeat. Never let the same mistake twice.

In practice that means CLAUDE.md files are living documents. Quality gates (automatic checks like type-check, lint, and build) act as sensors, in Böckeler's framing, that catch a bad change before it ships. Every rule you add is compounding intellectual property. The model stays the same. Your harness keeps getting smarter.

FAQ

What is an agent harness?

An agent harness is everything in an AI agent system except the model itself: the control loop that calls the model, the tools it can use, the memory it carries between sessions, the permission rules, the context management, and the recovery logic when something fails.

Does the model or the harness matter more for AI agent performance?

The harness. Benchmark research shows the same model can score 46% on a weak harness and 80% on a strong one, a 34-point swing with no change to the model. Harness quality now produces bigger performance differences than switching between frontier models.

What is harness engineering?

Harness engineering is the practice of watching agent failures and building structural fixes so the same mistake cannot recur. Mitchell Hashimoto coined the term in February 2026. In practice it means iteratively improving CLAUDE.md files, hooks, quality gates, and tool configurations, not fine-tuning the model.

What is the difference between an agent scaffold and an agent harness?

A scaffold is the behavior layer: system prompts, tool descriptions, and context rules. A harness is the execution layer: the loop that calls the model, handles tool outputs, and decides when to stop. The scaffold tells the agent how to think. The harness runs the agent and controls what it can do.

More in Real Builds

  • AIが自分自身を掃除する
    AIの乱雑さを自動的に掃除する3つの夜間Claude Codeワークフロー: slop-cleanerがデッドコードを削除し、/healが壊れたブランチを修復し、/driftがパターンドリフトを捉えます。
  • Agent Swarm Orchestration
    Four infrastructure layers that stop agent swarms from double-claiming tasks, drifting on field names, and collapsing under merge chaos.
  • GANループ
    1つのエージェントが生成し、もう1つが徹底的に批評し、スコアが改善しなくなるまでループする。エージェント定義とルーブリックテンプレートを含むGANループの実装。
  • 自律性のカーブ:AI エージェントにどこまで自由を渡せるのか
    AI エージェントにどれだけ自律性を渡せるかは、たった一つの要素で決まります。モデルが脱線せずにどこまで長くタスクを保てるか、です。優れた harness と信頼できるモデルがそろって、はじめて本物のエージェント作業が動き出します。
  • The AI Agent That Deleted a Production Database in 9 Seconds
    An AI deleted PocketOS's production database and all backups in 9 seconds. Here is why it happened and the guardrails that prevent it.
  • AIメールシーケンス
    Claude Codeの1コマンドで6シーケンス17本のライフサイクルメールを生成し、Inngestの行動トリガーを配線してデプロイ可能な分岐型メールファネルを構築します。

設定をやめて、構築を始めよう。

AIオーケストレーション付きSaaSビルダーテンプレート。

Vibe Coding vs Vibe Engineering vs Agentic Engineering: The 2026 Glossary

Vibe engineering, vibe coding, and agentic engineering defined plainly, with origins, a comparison table, and when to use each in 2026.

On this page

The proof: 98.4% of Claude Code is harness
What "harness" means, in plain words
Why the harness beats the model on performance
The four levers you control in a Claude Code harness
Three ways to get a harness
Memory is the hidden lock-in
Harness engineering is a loop, not a launch
FAQ
What is an agent harness?
Does the model or the harness matter more for AI agent performance?
What is harness engineering?
What is the difference between an agent scaffold and an agent harness?

設定をやめて、構築を始めよう。

AIオーケストレーション付きSaaSビルダーテンプレート。