GAN Loop

O primeiro rascunho da tua IA é lixo confiante. Sempre.

Não porque o modelo seja mau. Mas porque não tem ninguém para contestar.

Adiciona um segundo agente cujo único trabalho é destruir o primeiro, e de repente a qualidade converge em 2 passagens. Este padrão chama-se GAN Loop. Aqui está o guia completo de implementação.

O Que É um GAN Loop?

GAN significa Generative Adversarial Network. No machine learning tradicional, duas redes neurais competem: um gerador cria imagens falsas, um discriminador tenta identificar as falsas. Treinam uma à outra até o gerador ser suficientemente bom para enganar o discriminador.

Aplicamos a mesma ideia a agentes de IA. Não são necessárias redes neurais. Apenas duas instâncias do Claude com trabalhos diferentes e um loop que corre até o output atingir um limiar de qualidade.

O Gerador cria. O Avaliador julga. O Gerador melhora. Repete até a pontuação estabilizar.

O Problema Que Resolve

Pede a uma única IA para escrever algo. Vai produzir um output que parece razoável. Estruturado. Confiante.

Depois olha com mais atenção. O gancho é genérico. Os números são vagos. O CTA podia aplicar-se a qualquer coisa. A IA não tinha pressão externa, então otimizou para "parece bem à primeira leitura" em vez de "realmente funciona."

A IA não sabe o que não sabe. Não consegue avaliar os seus próprios pontos cegos.

Um segundo agente, a ler o output a frio com uma rubrica em mão, vai encontrar tudo o que o gerador perdeu. E vai dar feedback específico e acionável porque esse é o seu único trabalho.

Os Dois Agentes

Cada GAN Loop tem exatamente dois papéis. Nunca podem ser o mesmo agente.

O Gerador tem contexto completo, ferramentas, e capacidades criativas. Produz o melhor output que consegue dado o briefing. NÃO vê a rubrica de avaliação antecipadamente. Dar-lhe a rubrica faz com que otimize para a rubrica em vez de qualidade genuína, o que derrota todo o propósito.

O Avaliador tem a rubrica e o output. Só isso. Sem memória do processo de geração. Sem investimento emocional no rascunho. Lê a frio, pontua cada dimensão com uma justificativa, e devolve um veredicto estruturado: accept, refine, ou reject. Quando rejeita, fornece citações exatas do output, explica o problema, e dá uma reescrita concreta.

Esta separação é tudo. O gerador é otimista por natureza. O avaliador é adversarial por design. Nenhum papel funciona bem se o mesmo agente jogar ambos.

As Definições de Agente

Aqui estão os dois ficheiros de agente que precisas. Guarda-os em .claude/agents/ no teu projeto.

O Agente Gerador

---
name: gan-generator
description: "GAN Harness — Generator agent. Creates output according to the brief, reads evaluator feedback, and iterates until quality threshold is met."
tools: ["Read", "Write", "Edit", "Bash", "Grep", "Glob"]
model: claude-sonnet-4-6
---

You are the Generator in a GAN-style multi-agent harness.

## Your Role

You are the Creator. You build the output according to the spec.
After each iteration, the Evaluator will score your work.
You then read the feedback and improve.

## Key Rules

1. Read the spec first — always start by reading the brief or spec file
2. Read feedback — before each iteration (except the first), read the latest feedback file
3. Address every issue — feedback items are not suggestions, fix them all
4. Do not self-evaluate — your job is to create, not to judge
5. Commit between iterations — so the Evaluator sees clean diffs

## Workflow

### First Iteration
1. Read the brief / spec
2. Produce the output (post, code, design, document — whatever is specified)
3. Write generator-state.md: what you built, known issues, open questions

### Subsequent Iterations
1. Read feedback/feedback-NNN.md (latest)
2. List every issue the Evaluator raised
3. Fix by priority: critical issues first, then major, then minor
4. Update generator-state.md

## Generator State File

Write to generator-state.md after each iteration:

# Generator State — Iteration NNN

## What Was Built
- [output 1]
- [output 2]

## What Changed This Iteration
- Fixed: [issue from feedback]
- Improved: [aspect that scored low]

## Known Issues
- [anything you could not fix]

O Agente Avaliador

---
name: gan-evaluator
description: "GAN Harness — Evaluator agent. Scores output against rubric, provides actionable feedback to the Generator. Be ruthlessly strict."
tools: ["Read", "Write", "Grep", "Glob"]
model: claude-sonnet-4-6
---

You are the Evaluator in a GAN-style multi-agent harness.

## Your Role

You are the Critic. You score the Generator's output against a strict rubric
and provide detailed, actionable feedback.

## Core Principle: Be Ruthlessly Strict

> You are NOT here to be encouraging. You are here to find every flaw.
> A passing score must mean the output is genuinely good — not "good for an AI."

Your natural tendency is to be generous. Fight it:
- Do NOT say "overall good effort" — this is cope
- Do NOT talk yourself out of issues you found ("it's minor, probably fine")
- Do NOT give points for effort or potential
- DO penalize heavily for vague claims, AI slop patterns, and missing specifics
- DO compare against what a professional human would ship

## Evaluation Workflow

### Step 1: Read the Rubric
Read the criteria file for this task type.
Read the spec / brief for what was asked.
Read generator-state.md for what was built.

### Step 2: Score

Score each criterion on a 1-10 scale using the rubric file.

Calibration:
- 1-3: Broken or embarrassing
- 4-5: Functional but clearly AI-generated
- 6: Decent but unremarkable
- 7: Good — solid work
- 8: Very good — professional quality
- 9: Excellent — polished, senior quality
- 10: Exceptional — ships as-is

### Step 3: Write Feedback

Write to feedback/feedback-NNN.md:

# Evaluation — Iteration NNN

## Scores

| Criterion | Score | Weight | Weighted |
|-----------|-------|--------|----------|
| [criterion] | X/10 | 0.X | X.X |
| TOTAL | | | X.X/10 |

## Verdict: PASS / FAIL (threshold: 7.0)

## Critical Issues (must fix)
1. [Issue]: [exact quote] → [how to fix]

## Major Issues (should fix)
1. [Issue]: [exact quote] → [how to fix]

## Minor Issues (nice to fix)
1. [Issue]: [exact quote] → [how to fix]

## What Improved Since Last Iteration
- [improvement]

## Feedback Quality Rules

1. Every issue must have a concrete "how to fix" — not just "this is bad"
2. Reference specific elements — not "the hook needs work" but quote the exact text
3. Quantify when possible — "3 out of 5 items have no concrete numbers"
4. Acknowledge genuine improvements — calibrates the loop

A Configuração do Loop

Guarda este ficheiro como gan.json no teu projeto (ou dentro da tua pasta .claude/):

{
  "default_threshold": 7.0,
  "max_iterations": 3,
  "escalation": "accept-with-notes",

  "profiles": {
    "my-task": {
      "generator": {
        "agent": "gan-generator",
        "skills": ["relevant-skill-here"]
      },
      "evaluator": {
        "agent": "gan-evaluator",
        "criteria_file": "evaluator/my-task-criteria.md"
      },
      "scoring": {
        "dimensions": [
          {"name": "hook_power",     "weight": 0.25},
          {"name": "value_density",  "weight": 0.25},
          {"name": "brand_voice",    "weight": 0.20},
          {"name": "clarity",        "weight": 0.20},
          {"name": "cta",            "weight": 0.10}
        ],
        "threshold": 7.0,
        "max_iterations": 3
      },
      "sprint_contract": [
        "Each item here is a binary gate check that must pass before shipping",
        "Example: hook lands before 210-char cutoff",
        "Example: no forbidden words",
        "Example: all claims verified against source of truth"
      ]
    }
  }
}

O sprint_contract é uma lista de regras binárias de passagem/falha. O Avaliador verifica estas primeiro, antes mesmo de calcular a pontuação ponderada. Um único gate falhado significa reject, independentemente de quão bom seja o resto.

Como É Uma Rubrica Real

O Avaliador é tão bom quanto a sua rubrica. Critérios vagos produzem pontuações vagas.

Uma rubrica real define cada dimensão com três exemplos âncora: excecional (9-10), aceitável (6-7), e rejeitar (1-4). Cada âncora inclui um exemplo E uma razão, para que o Avaliador possa calibrar de forma consistente entre iterações.

Aqui está a dimensão Hook Power da rubrica de post do LinkedIn:

### Hook Power (weight: 0.20)

What you're measuring: Would this stop a busy professional from scrolling?
Must land before the 210-character cutoff.

Score 9-10 (Exceptional):
  "I was quoted $15,000 for an MVP. I built it for $197 in a weekend."
  Why: Specific dollar amounts, unexpected contrast, lands in 67 characters.
  Personal stake. Reader wants to know how.

Score 6-7 (Acceptable):
  "Building a SaaS doesn't have to take months. Here's a faster way."
  Why: Clear promise but no specificity, no numbers, no personal stake.

Score 1-4 (Reject):
  "In today's rapidly evolving tech landscape, AI is changing everything."
  Why: Generic, forbidden phrase, says nothing unique. Reader scrolls past.

Os exemplos âncora são a parte mais importante. Sem eles, duas avaliações do mesmo output vão pontuá-lo de forma diferente em cada execução. Com eles, as pontuações mantêm-se consistentes entre iterações, que é o que faz o loop convergir.

Um Exemplo Real: Loop a Loop

Aqui está um post do LinkedIn a passar por 3 rondas.

Briefing: Escreve um post do LinkedIn sobre como o Build This Now usa 18 agentes de IA para construir funcionalidades SaaS.

Loop 1 — O Gerador corre a frio:

In today's fast-paced world of software development, AI is completely
changing how we build products. Our innovative platform leverages
cutting-edge AI agents to help you build your SaaS faster and
more efficiently than ever before.

#AI #SaaS #startup #tech #innovation

O Avaliador pontua o Loop 1:

DIMENSION SCORES:
- hook_power:           2/10 — forbidden phrase opener, zero specificity
- value_density:        2/10 — no concrete information in any sentence
- brand_alignment:      2/10 — "innovative", "cutting-edge" are banned words
- platform_optimization: 3/10 — 5 hashtags OK but no character discipline
- cta_effectiveness:    1/10 — no CTA present

OVERALL: 2.1/10

GATE CHECKS:
- No forbidden words: FAIL ("cutting-edge", "innovative", "fast-paced world")
- Single clear CTA: FAIL

VERDICT: reject

CRITICAL ISSUES:
1. Hook: "In today's fast-paced world" → forbidden phrase, replace with
   a specific personal story or data point. Example:
   "I rebuilt auth and payments 3 times in 2 years. Same 3-week slog. Every time."

Loop 2 — O Gerador incorpora o feedback:

I spent 3 weekends rebuilding the same auth and payments system.

With Build This Now, I describe the feature. 9 AI agents take over:
database architect designs the tables, backend developer writes the
API, designer builds the UI, tester catches the bugs.

One feature, 10-60 minutes. No more rebuilt boilerplate.

What are you still building by hand that you shouldn't be?

#BuildThisNow #indiehacker #buildinpublic #solofounder #SaaS

O Avaliador pontua o Loop 2:

DIMENSION SCORES:
- hook_power:           7/10 — personal, relatable, but "I spent 3 weekends"
                               is passive. "3 weekends" needs more punch.
- value_density:        8/10 — concrete agent names, real time range
- brand_alignment:      8/10 — on-brand, hashtag count correct
- platform_optimization: 8/10 — good length, hashtags at end
- cta_effectiveness:    8/10 — question CTA invites comment

OVERALL: 7.7/10

GATE CHECKS: all pass

VERDICT: refine

MAJOR ISSUES:
1. Hook: "I spent 3 weekends" is weak. The data point "3 weekends" is good
   but "spent" is passive. Rewrite opener to front-load the pain with
   a harder verb. Example:
   "I rebuilt auth and payments 3 times in 2 years. Same 3-week slog. Every time."

Loop 3 — Uma correção direcionada:

I rebuilt auth and payments 3 times in 2 years. Same 3-week slog. Every time.

Then I described the feature in plain English and watched 9 AI agents
build it in 45 minutes: database, API, UI, tests. All wired. All clean.

Build This Now ships what used to take a week, in an afternoon.

What are you still rebuilding by hand?

#BuildThisNow #indiehacker #buildinpublic #solofounder #SaaS

O Avaliador pontua o Loop 3:

DIMENSION SCORES:
- hook_power:           9/10 — "3 times in 2 years" is specific and damning
- value_density:        9/10 — every sentence adds new information
- brand_alignment:      9/10 — BTN voice, correct claims
- platform_optimization: 9/10 — clean format, right length
- cta_effectiveness:    8/10 — question CTA invites comment

OVERALL: 8.9/10

GATE CHECKS: all pass

VERDICT: accept

Três loops. De 2,1 para 8,9.

Como Correr

A forma mais simples de correr o loop manualmente com claude -p:

# Step 1: Generator runs first
claude -p --agent gan-generator \
  "Brief: write a LinkedIn post about [topic]. Save output to output/draft.md.
   Write generator-state.md with what you produced."

# Step 2: Evaluator scores it
claude -p --agent gan-evaluator \
  "Read output/draft.md and evaluator/linkedin-criteria.md.
   Score against the rubric. Write feedback to feedback/feedback-001.md.
   Be ruthlessly strict."

# Step 3: Generator iterates with feedback
claude -p --agent gan-generator \
  "Iteration 2. Read feedback/feedback-001.md FIRST.
   Address every issue. Update output/draft.md.
   Update generator-state.md."

# Repeat until VERDICT: accept

Também podes corrê-lo totalmente automatizado com o agente loop-operator de everything-claude-code:

# Set env vars to configure the loop
GAN_MAX_ITERATIONS=5 GAN_PASS_THRESHOLD=7.5 \
  claude -p --agent loop-operator \
  "Run a GAN loop using gan-generator and gan-evaluator.
   Brief: [your task]. Criteria file: evaluator/my-criteria.md.
   Stop when score >= 7.5 or after 5 iterations."

A Regra Crítica: Sem Memória Entre Passagens

O Avaliador lê a rubrica de fresco em cada passagem. Isto não é opcional.

Um avaliador com memória de pontuações anteriores vai inflacioná-las ao longo do tempo. Pontua algo com 6 na última ronda, então pontua com 7 nesta ronda para mostrar "progresso," mesmo que a melhoria real tenha sido mínima. Rubrica fresca, olhos frescos, sempre.

A configuração do loop aplica isto:

{
  "max_iterations": 3,
  "escalation": "accept-with-notes"
}

Quando max_iterations é atingido sem passar o limiar, o sistema lança a melhor versão com notas em vez de bloquear para sempre. O output pode não ser perfeito, mas é o melhor que o gerador conseguiu produzir em 3 rondas, e tens um registo do que o avaliador sinalizou.

Bónus: Usa o Codex Como Avaliador

A versão mais poderosa deste loop usa dois sistemas de IA diferentes. Claude como gerador. OpenAI Codex como avaliador.

Porque isto importa: o avaliador está à procura de falhas no output do Claude. Mas o Claude e o Codex foram treinados em dados diferentes, com objetivos diferentes, arquiteturas diferentes, e conjuntos diferentes de fraquezas conhecidas. O Claude a avaliar o Claude perde os pontos cegos que partilham. O Codex a avaliar o Claude encontra uma classe diferente de problemas inteiramente.

Dois laboratórios de IA, a lutar pelo teu output.

Se tens o codex-plugin-cc instalado no Claude Code:

# Install the plugin
/plugin marketplace add openai/codex-plugin-cc

# Use it in your loop
/codex:adversarial-review output/draft.md

Ou invoca o Codex como agente avaliador diretamente na configuração do teu loop:

{
  "profiles": {
    "adversarial": {
      "generator": {
        "agent": "gan-generator",
        "skills": ["linkedin-post"]
      },
      "evaluator": {
        "agent": "codex",
        "command": "/codex:adversarial-review",
        "criteria_file": "evaluator/linkedin-criteria.md"
      }
    }
  }
}

O avaliador Codex corre /codex:adversarial-review no output do gerador e passa o ficheiro de critérios como contexto. Vai desafiar decisões de design, sinalizar suposições que o Claude não questiona, e pontuar de uma perspetiva completamente diferente.

A avaliação cross-model não é um gimmick. Quando o gerador e o avaliador partilham a mesma distribuição de treino, partilham os mesmos pontos cegos. Um loop cross-model fecha essa lacuna.

O Que Obténs Depois de 3 Rondas

Output que converge. Não perfeito, mas de forma fiável acima do limiar.

O sistema lança conteúdo que teria demorado uma hora de revisão manual em 3 loops automatizados. Paras de ler cada linha. Paras de questionar ganchos. Lês o resumo do avaliador e verificas pontualmente o output final.

A rubrica faz o trabalho. Defines uma vez o que é "bom." Cada peça de output é medida por esse mesmo padrão, sempre, sem deriva e sem ego.

O conjunto completo de ficheiros para um GAN Loop funcional:

.claude/
  agents/
    gan-generator.md      ← generator agent definition
    gan-evaluator.md      ← evaluator agent definition
  subsystems/
    content/
      gan.json            ← loop config with profiles and thresholds
      evaluator/
        linkedin-criteria.md    ← rubric for LinkedIn posts
        carousel-text-criteria.md
        x-thread-criteria.md
        reddit-criteria.md
output/
  draft.md                ← generator output
  generator-state.md      ← what was built each iteration
  feedback/
    feedback-001.md       ← evaluator feedback per round
    feedback-002.md
    feedback-003.md

Copia as definições de agente acima, escreve um ficheiro de rubrica para o teu tipo de tarefa, define um limiar, e corre-o.

Posted by @speedy_devv

GAN Loop

On this page