GANループ

AIの最初のドラフトは毎回、自信に満ちたスラップだ。

モデルが悪いのではない。反論する相手がいないからだ。

最初のものを徹底的に批評することだけを仕事とする2番目のエージェントを追加すると、品質が2パスで収束する。このパターンをGANループと呼ぶ。完全な実装ガイドをここに示す。

GANループとは？

GANは生成的敵対的ネットワーク（Generative Adversarial Network）の略だ。従来の機械学習では、2つのニューラルネットワークが競争する：ジェネレーターが偽の画像を作り、ディスクリミネーターが偽物を見破ろうとする。ジェネレーターがディスクリミネーターを欺けるほど良くなるまで、互いにトレーニングする。

同じアイデアをAIエージェントに適用する。ニューラルネットワークは不要だ。異なるジョブを持つ2つのClaudeインスタンスと、出力が品質閾値に達するまで実行されるループだけだ。

ジェネレーターが生成する。評価者が判断する。ジェネレーターが改善する。スコアが頭打ちになるまで繰り返す。

解決する問題

単一のAIに何かを書くよう依頼すると、合理的に見える出力を生み出す。構造化されている。自信がある。

しかし、よく見ると。フックは汎用だ。数値は曖昧だ。CTAは何にでも当てはまる。AIには外部からのプレッシャーがなかったので、「実際にパフォーマンスする」ではなく「最初の読み取りで良く見える」を最適化した。

AIは自分が知らないことを知らない。自分の盲点を評価できない。

ルーブリックを手に出力をコールドで読む2番目のエージェントは、ジェネレーターが見逃したすべてを捕捉する。そして、それが唯一の仕事だから具体的で実行可能なフィードバックを与える。

2つのエージェント

すべてのGANループには正確に2つのロールがある。同じエージェントであってはならない。

ジェネレーターは完全なコンテキスト、ツール、クリエイティブスキルを持つ。ブリーフに基づいて最高の出力を生み出す。事前に評価ルーブリックを見てはいけない。ルーブリックを見せると本物の品質ではなくルーブリックに最適化し、意味がなくなる。

評価者はルーブリックと出力を持つ。それだけだ。生成プロセスの記憶なし。ドラフトへの感情的な投資なし。コールドで読み、すべての次元をスコアリングし根拠を示し、構造化した評決を返す：accept、refine、reject。拒否するときは、出力の正確な引用を提供し、問題を説明し、具体的な書き直しを提示する。

この分離がすべてだ。ジェネレーターは本質的に楽観的だ。評価者は設計上敵対的だ。同じエージェントが両方のロールを演じるとどちらもうまく機能しない。

エージェント定義

ここに必要な2つのエージェントファイルがある。プロジェクトの .claude/agents/ に保存しよう。

ジェネレーターエージェント

---
name: gan-generator
description: "GAN Harness — Generator agent. Creates output according to the brief, reads evaluator feedback, and iterates until quality threshold is met."
tools: ["Read", "Write", "Edit", "Bash", "Grep", "Glob"]
model: claude-sonnet-4-6
---

You are the Generator in a GAN-style multi-agent harness.

## Your Role

You are the Creator. You build the output according to the spec.
After each iteration, the Evaluator will score your work.
You then read the feedback and improve.

## Key Rules

1. Read the spec first — always start by reading the brief or spec file
2. Read feedback — before each iteration (except the first), read the latest feedback file
3. Address every issue — feedback items are not suggestions, fix them all
4. Do not self-evaluate — your job is to create, not to judge
5. Commit between iterations — so the Evaluator sees clean diffs

## Workflow

### First Iteration
1. Read the brief / spec
2. Produce the output (post, code, design, document — whatever is specified)
3. Write generator-state.md: what you built, known issues, open questions

### Subsequent Iterations
1. Read feedback/feedback-NNN.md (latest)
2. List every issue the Evaluator raised
3. Fix by priority: critical issues first, then major, then minor
4. Update generator-state.md

## Generator State File

Write to generator-state.md after each iteration:

# Generator State — Iteration NNN

## What Was Built
- [output 1]
- [output 2]

## What Changed This Iteration
- Fixed: [issue from feedback]
- Improved: [aspect that scored low]

## Known Issues
- [anything you could not fix]

評価者エージェント

---
name: gan-evaluator
description: "GAN Harness — Evaluator agent. Scores output against rubric, provides actionable feedback to the Generator. Be ruthlessly strict."
tools: ["Read", "Write", "Grep", "Glob"]
model: claude-sonnet-4-6
---

You are the Evaluator in a GAN-style multi-agent harness.

## Your Role

You are the Critic. You score the Generator's output against a strict rubric
and provide detailed, actionable feedback.

## Core Principle: Be Ruthlessly Strict

> You are NOT here to be encouraging. You are here to find every flaw.
> A passing score must mean the output is genuinely good — not "good for an AI."

Your natural tendency is to be generous. Fight it:
- Do NOT say "overall good effort" — this is cope
- Do NOT talk yourself out of issues you found ("it's minor, probably fine")
- Do NOT give points for effort or potential
- DO penalize heavily for vague claims, AI slop patterns, and missing specifics
- DO compare against what a professional human would ship

## Evaluation Workflow

### Step 1: Read the Rubric
Read the criteria file for this task type.
Read the spec / brief for what was asked.
Read generator-state.md for what was built.

### Step 2: Score

Score each criterion on a 1-10 scale using the rubric file.

Calibration:
- 1-3: Broken or embarrassing
- 4-5: Functional but clearly AI-generated
- 6: Decent but unremarkable
- 7: Good — solid work
- 8: Very good — professional quality
- 9: Excellent — polished, senior quality
- 10: Exceptional — ships as-is

### Step 3: Write Feedback

Write to feedback/feedback-NNN.md:

# Evaluation — Iteration NNN

## Scores

| Criterion | Score | Weight | Weighted |
|-----------|-------|--------|----------|
| [criterion] | X/10 | 0.X | X.X |
| TOTAL | | | X.X/10 |

## Verdict: PASS / FAIL (threshold: 7.0)

## Critical Issues (must fix)
1. [Issue]: [exact quote] → [how to fix]

## Major Issues (should fix)
1. [Issue]: [exact quote] → [how to fix]

## Minor Issues (nice to fix)
1. [Issue]: [exact quote] → [how to fix]

## What Improved Since Last Iteration
- [improvement]

## Feedback Quality Rules

1. Every issue must have a concrete "how to fix" — not just "this is bad"
2. Reference specific elements — not "the hook needs work" but quote the exact text
3. Quantify when possible — "3 out of 5 items have no concrete numbers"
4. Acknowledge genuine improvements — calibrates the loop

ループ設定

これをプロジェクト（または .claude/ フォルダ内）に gan.json として保存しよう：

{
  "default_threshold": 7.0,
  "max_iterations": 3,
  "escalation": "accept-with-notes",

  "profiles": {
    "my-task": {
      "generator": {
        "agent": "gan-generator",
        "skills": ["relevant-skill-here"]
      },
      "evaluator": {
        "agent": "gan-evaluator",
        "criteria_file": "evaluator/my-task-criteria.md"
      },
      "scoring": {
        "dimensions": [
          {"name": "hook_power",     "weight": 0.25},
          {"name": "value_density",  "weight": 0.25},
          {"name": "brand_voice",    "weight": 0.20},
          {"name": "clarity",        "weight": 0.20},
          {"name": "cta",            "weight": 0.10}
        ],
        "threshold": 7.0,
        "max_iterations": 3
      },
      "sprint_contract": [
        "Each item here is a binary gate check that must pass before shipping",
        "Example: hook lands before 210-char cutoff",
        "Example: no forbidden words",
        "Example: all claims verified against source of truth"
      ]
    }
  }
}

sprint_contract はバイナリの合否ルールのリストだ。評価者は重み付けスコアを計算する前にこれらを最初に確認する。1つのゲートが失敗すれば、残りの部分がどれほど良くても reject になる。

本物のルーブリックはどんな見た目か

評価者はルーブリックの質と同程度だ。曖昧な基準は曖昧なスコアを生み出す。

本物のルーブリックは各次元を3つのアンカー例で定義する：例外的（9〜10）、許容可能（6〜7）、拒否（1〜4）。各アンカーには例と理由が含まれており、評価者がイテレーション全体で一貫してキャリブレーションできるようにする。

LinkedInポストルーブリックのフックパワー次元を以下に示す：

### Hook Power (weight: 0.20)

What you're measuring: Would this stop a busy professional from scrolling?
Must land before the 210-character cutoff.

Score 9-10 (Exceptional):
  "I was quoted $15,000 for an MVP. I built it for $29 in a weekend."
  Why: Specific dollar amounts, unexpected contrast, lands in 67 characters.
  Personal stake. Reader wants to know how.

Score 6-7 (Acceptable):
  "Building a SaaS doesn't have to take months. Here's a faster way."
  Why: Clear promise but no specificity, no numbers, no personal stake.

Score 1-4 (Reject):
  "In today's rapidly evolving tech landscape, AI is changing everything."
  Why: Generic, forbidden phrase, says nothing unique. Reader scrolls past.

アンカー例が最も重要な部分だ。それらがなければ、同じ出力の2回の評価は毎回異なるスコアをつける。それらがあれば、スコアはイテレーション全体で一貫性を保ち、ループが収束する。

実例：ループごとに

LinkedInポストが3ラウンドを経る様子を示す。

ブリーフ：Build This Nowが18のAIエージェントを使ってSaaS機能を構築する方法についてLinkedInポストを書く。

ループ1 - ジェネレーターがコールドに実行：

In today's fast-paced world of software development, AI is completely
changing how we build products. Our innovative platform leverages
cutting-edge AI agents to help you build your SaaS faster and
more efficiently than ever before.

#AI #SaaS #startup #tech #innovation

評価者がループ1をスコアリング：

DIMENSION SCORES:
- hook_power:           2/10 — forbidden phrase opener, zero specificity
- value_density:        2/10 — no concrete information in any sentence
- brand_alignment:      2/10 — "innovative", "cutting-edge" are banned words
- platform_optimization: 3/10 — 5 hashtags OK but no character discipline
- cta_effectiveness:    1/10 — no CTA present

OVERALL: 2.1/10

GATE CHECKS:
- No forbidden words: FAIL ("cutting-edge", "innovative", "fast-paced world")
- Single clear CTA: FAIL

VERDICT: reject

CRITICAL ISSUES:
1. Hook: "In today's fast-paced world" → forbidden phrase, replace with
   a specific personal story or data point. Example:
   "I rebuilt auth and payments 3 times in 2 years. Same 3-week slog. Every time."

ループ2 - ジェネレーターがフィードバックを取り込む：

I spent 3 weekends rebuilding the same auth and payments system.

With Build This Now, I describe the feature. 9 AI agents take over:
database architect designs the tables, backend developer writes the
API, designer builds the UI, tester catches the bugs.

One feature, 10-60 minutes. No more rebuilt boilerplate.

What are you still building by hand that you shouldn't be?

#BuildThisNow #indiehacker #buildinpublic #solofounder #SaaS

評価者がループ2をスコアリング：

DIMENSION SCORES:
- hook_power:           7/10 — personal, relatable, but "I spent 3 weekends"
                               is passive. "3 weekends" needs more punch.
- value_density:        8/10 — concrete agent names, real time range
- brand_alignment:      8/10 — on-brand, hashtag count correct
- platform_optimization: 8/10 — good length, hashtags at end
- cta_effectiveness:    8/10 — question CTA invites comment

OVERALL: 7.7/10

GATE CHECKS: all pass

VERDICT: refine

MAJOR ISSUES:
1. Hook: "I spent 3 weekends" is weak. The data point "3 weekends" is good
   but "spent" is passive. Rewrite opener to front-load the pain with
   a harder verb. Example:
   "I rebuilt auth and payments 3 times in 2 years. Same 3-week slog. Every time."

ループ3 - 1つの的を絞った修正：

I rebuilt auth and payments 3 times in 2 years. Same 3-week slog. Every time.

Then I described the feature in plain English and watched 9 AI agents
build it in 45 minutes: database, API, UI, tests. All wired. All clean.

Build This Now ships what used to take a week, in an afternoon.

What are you still rebuilding by hand?

#BuildThisNow #indiehacker #buildinpublic #solofounder #SaaS

評価者がループ3をスコアリング：

DIMENSION SCORES:
- hook_power:           9/10 — "3 times in 2 years" is specific and damning
- value_density:        9/10 — every sentence adds new information
- brand_alignment:      9/10 — BTN voice, correct claims
- platform_optimization: 9/10 — clean format, right length
- cta_effectiveness:    8/10 — question CTA invites comment

OVERALL: 8.9/10

GATE CHECKS: all pass

VERDICT: accept

3ループ。2.1から8.9へ。

実行方法

claude -p でループを手動で実行する最も簡単な方法：

# Step 1: Generator runs first
claude -p --agent gan-generator \
  "Brief: write a LinkedIn post about [topic]. Save output to output/draft.md.
   Write generator-state.md with what you produced."

# Step 2: Evaluator scores it
claude -p --agent gan-evaluator \
  "Read output/draft.md and evaluator/linkedin-criteria.md.
   Score against the rubric. Write feedback to feedback/feedback-001.md.
   Be ruthlessly strict."

# Step 3: Generator iterates with feedback
claude -p --agent gan-generator \
  "Iteration 2. Read feedback/feedback-001.md FIRST.
   Address every issue. Update output/draft.md.
   Update generator-state.md."

# Repeat until VERDICT: accept

everything-claude-code の loop-operator エージェントを使って完全自動化することもできる：

# Set env vars to configure the loop
GAN_MAX_ITERATIONS=5 GAN_PASS_THRESHOLD=7.5 \
  claude -p --agent loop-operator \
  "Run a GAN loop using gan-generator and gan-evaluator.
   Brief: [your task]. Criteria file: evaluator/my-criteria.md.
   Stop when score >= 7.5 or after 5 iterations."

重要なルール：パス間でメモリなし

評価者は毎パスでルーブリックを新しく読む。これは任意ではない。

以前のスコアの記憶を持つ評価者は時間とともにそれらを水増しする。前のラウンドで何かを6とスコアした場合、実際の改善が最小だったとしても、「進捗」を示すために今ラウンドで7とスコアする。新しいルーブリック、新しい目、毎回。

ループ設定がこれを強制する：

{
  "max_iterations": 3,
  "escalation": "accept-with-notes"
}

閾値を通過せずに max_iterations に達したとき、システムは永遠にブロックするのではなく、ノート付きでベストバージョンを出荷する。出力は完璧ではないかもしれないが、3ラウンドでジェネレーターが生み出せた最善であり、評価者がフラグを立てたものの記録がある。

ボーナス：評価者としてCodexを使う

このループの最も強力なバージョンは2つの異なるAIシステムを使う。ジェネレーターとしてClaude。評価者としてOpenAI Codex。

これが重要な理由：評価者はClaudeの出力の欠陥を探している。しかしClaudeとCodexは異なるデータで、異なる目的、異なるアーキテクチャ、異なる既知の弱点のセットでトレーニングされた。ClaudeがClaudeを評価するときは共有する盲点を見逃す。CodexがClaudeを評価するときは全く異なるクラスの問題を捕捉する。

2つのAIラボが出力を争う。

Claude Codeに codex-plugin-cc がインストールされていれば：

# Install the plugin
/plugin marketplace add openai/codex-plugin-cc

# Use it in your loop
/codex:adversarial-review output/draft.md

またはループ設定で評価者エージェントとして直接Codexを呼び出す：

{
  "profiles": {
    "adversarial": {
      "generator": {
        "agent": "gan-generator",
        "skills": ["linkedin-post"]
      },
      "evaluator": {
        "agent": "codex",
        "command": "/codex:adversarial-review",
        "criteria_file": "evaluator/linkedin-criteria.md"
      }
    }
  }
}

Codex評価者はジェネレーターの出力で /codex:adversarial-review を実行し、コンテキストとして基準ファイルを渡す。デザインの決定に疑問を呈し、Claudeが疑問を持たない仮定にフラグを立て、全く異なる視点からスコアリングする。

クロスモデル評価はギミックではない。ジェネレーターと評価者が同じ学習分布を共有するとき、同じ盲点を共有する。クロスモデルループはそのギャップを埋める。

3ラウンド後に得るもの

収束する出力。完璧ではないが、確実に閾値以上だ。

システムは手動レビューに1時間かかっていたコンテンツを3つの自動化されたループで出荷する。毎行を読まなくなる。フックを二重チェックしなくなる。評価者サマリーを読んで最終出力をスポットチェックする。

ルーブリックが仕事をする。「良い」が何を意味するかを一度定義する。すべての出力が毎回、ドリフトなし、エゴなしで同じ基準に照らし合わせられる。

動作するGANループの完全なファイルセット：

.claude/
  agents/
    gan-generator.md      ← generator agent definition
    gan-evaluator.md      ← evaluator agent definition
  subsystems/
    content/
      gan.json            ← loop config with profiles and thresholds
      evaluator/
        linkedin-criteria.md    ← rubric for LinkedIn posts
        carousel-text-criteria.md
        x-thread-criteria.md
        reddit-criteria.md
output/
  draft.md                ← generator output
  generator-state.md      ← what was built each iteration
  feedback/
    feedback-001.md       ← evaluator feedback per round
    feedback-002.md
    feedback-003.md

上記のエージェント定義をコピーし、タスクタイプ用のルーブリックファイルを1つ書き、閾値を設定して実行しよう。

Posted by @speedy_devv

GANループ

On this page