Build This Now
Build This Now
O que é o Código Claude?Instalar o Claude CodeInstalador Nativo do Claude CodeO Teu Primeiro Projeto com Claude Code
How Does an LLM Actually Work? (ChatGPT and Claude, Explained Without Math)How Does AI Image Generation Work? (The Noise-to-Picture Trick)How Do AI Agents Actually Work? (The Loop That Lets AI Do Things)What Is a Token in AI? (Why ChatGPT Charges by the Token)What Is a Vector Embedding? (And How RAG Lets AI Read Your Documents)How ChatGPT's 'Dreaming' Memory Works (and What to Turn Off)Why a Hidden Line of Text Can Hijack Your AI BrowserHow Much Energy and Water Does AI Actually Use?Is AI a Bubble? 'Circular Financing' in Plain EnglishThe EU AI Act, Explained: What Changes on August 2, 2026How Do AI Voice-Cloning Scams Work? (And How to Spot One)What Is Agentic Commerce? How AI Agents Buy Things for YouWhy Does AI Run on GPUs, Not CPUs? (One Genius vs. a Thousand Interns)How Does HTTPS Work? (The Padlock, and Why Nobody Can Read Your Password)
speedy_devvkoen_salo
Blog/Handbook/Core/How Does AI Image Generation Work? (The Noise-to-Picture Trick)

How Does AI Image Generation Work? (The Noise-to-Picture Trick)

AI image generators like Midjourney and DALL-E start with pure visual static and slowly remove the noise until a picture appears — guided by your words. Here's how diffusion actually works, explained simply.

Pare de configurar. Comece a construir.

Templates SaaS com orquestração de IA.

Published Jun 13, 20268 min readHandbook hubCore index

AI image generators work by starting with a screen of pure random static — like an untuned TV — and then removing the noise a little at a time until a picture emerges, with your text prompt steering what that picture becomes. It's closer to a sculptor revealing a statue inside a block of marble than to a painter adding strokes to a blank canvas. The technique is called diffusion, and once you see it, the whole thing makes sense — including why AI used to give everyone six fingers.

This is a completely different kind of model from the language models behind ChatGPT. LLMs predict text; diffusion models denoise images. Here's how the second one works.

Table of Contents

  1. The Core Idea: Sculpting Away Noise
  2. How It Learned: Add Noise, Then Reverse It
  3. Where Your Words Come In
  4. Why It Used to Mess Up Hands
  5. Why the Same Prompt Gives Different Images
  6. How AI Video Builds on This
  7. Frequently Asked Questions

Pare de configurar. Comece a construir.

Templates SaaS com orquestração de IA.

The Core Idea: Sculpting Away Noise

Picture a TV tuned to static — a screen of random colored dots. Now imagine a machine that looks at that static and asks, "If there were a picture of a cat hidden in here, which dots should I nudge to make it slightly more cat-like?" It makes a small adjustment. Then it asks again. And again — typically 20 to 50 times.

With each pass, the random noise gets a little more organized, a little more like the thing you asked for, until a clean image is sitting where the static used to be. That's diffusion: not painting a picture, but progressively denoising random static into one.

How It Learned: Add Noise, Then Reverse It

The clever part is how the model learned to do this. During training, it was shown millions of real images, and for each one it did the process backwards:

  1. Take a real photo (say, a dog).
  2. Gradually add noise to it, step by step, until it's pure static. The model watches this happen.
  3. Learn to undo each step — to predict "what did this look like one step less noisy?"

Do that across millions of images and the model becomes an expert at one thing: taking a noisy image and making it slightly cleaner. To generate a new image, you just start it at the end — pure noise — and let it run its cleanup process. Because it learned from real images, the "clean" version it heads toward looks like a real image too.

What the model seesWhat it learns
Training (backward)Real image → slowly add noise → staticHow to reverse one step of noise
Generating (forward)Start from pure static → slowly remove noise → imageProduces a brand-new image

Where Your Words Come In

Left alone, the model would denoise toward some plausible image, but not necessarily what you want. Your text prompt is the steering wheel.

The words "a red bicycle on a beach at sunset" get turned into numbers the model understands (the same kind of meaning-coordinates used in embeddings). At every denoising step, the model nudges the image not just toward "a realistic picture" but toward "a realistic picture that matches these words." More steps and stronger guidance pull the result closer to your prompt.

Why It Used to Mess Up Hands

The infamous six-fingered hands weren't random — they're a direct clue to how diffusion works. The model never learned "a hand has exactly five fingers" as a rule. It learned what hands tend to look like — pinkish, with several finger-shaped protrusions. Since it builds the image from blurry noise into detail, and hands appear in countless positions and counts in training photos, it often settled on "about the right number" of fingers rather than exactly five.

Modern models (2026) mostly fixed this with better training and more parameters — but the lesson holds: these models reproduce statistical patterns, not hard rules. They're brilliant at vibes, historically shaky on exact counts, text in images, and rigid geometry.

Why the Same Prompt Gives Different Images

Each generation starts from a different patch of random noise (a "seed"). Different starting static, denoised toward the same prompt, lands on a different final image — the same way two sculptors handed different marble blocks would carve slightly different statues of the same subject. Lock the seed and you can reproduce the exact image; change it and you get fresh variations.

How AI Video Builds on This

AI video (Sora, Veo, and others) extends diffusion across time: it denoises many frames at once while trying to keep them consistent from one to the next. That consistency is the hard part — and it's exactly why AI video sometimes flickers, morphs objects, or drifts in physics. The model is denoising each frame from noise and only approximately remembering what the last frame looked like. Those tiny inconsistencies are, conveniently, also how you can often spot an AI-generated clip.

Pare de configurar. Comece a construir.

Templates SaaS com orquestração de IA.

Frequently Asked Questions

How does AI image generation actually work?

It uses a technique called diffusion. The model starts with a field of random visual noise and removes that noise step by step — usually 20 to 50 times — nudging the image toward something realistic that matches your text prompt, until a finished picture emerges.

What is diffusion in AI?

Diffusion is the process of turning random noise into a coherent image by repeatedly "denoising" it. The model learned this by watching millions of real images get progressively corrupted into static, then learning to reverse each step. To make new images, it runs that reversal starting from pure noise.

Why does AI image generation get hands and text wrong?

Because the model learned statistical patterns of what things look like, not hard rules like "hands have five fingers" or how letters form words. It builds images from blurry to sharp, so exact counts, text, and rigid geometry are historically weak spots — though 2026 models have improved a lot.

Why do I get a different image each time with the same prompt?

Each run starts from a different patch of random noise, called a seed. Denoising different starting static toward the same prompt produces different final images. If you fix the seed, you can reproduce the exact same image.

Is AI image generation the same as ChatGPT?

No. ChatGPT is a language model that predicts text. Image generators use diffusion models that denoise images. They're different architectures for different jobs, though both turn your words into numbers to guide the output.

Continue in Core

  • Janela de Contexto de 1M no Claude Code
    A Anthropic ativou a janela de contexto de 1M tokens para o Opus 4.6 e o Sonnet 4.6 no Claude Code. Sem header beta, sem sobretaxa, preços fixos e menos compactações.
  • AGENTS.md vs CLAUDE.md Explicados
    Dois arquivos de contexto, um codebase. Como AGENTS.md e CLAUDE.md diferem, o que cada um faz e como usar os dois sem duplicar nada.
  • Why a Hidden Line of Text Can Hijack Your AI Browser
    AI browsers read the whole web page — including text hidden from you. That's the door behind prompt injection, OWASP's #1 AI security risk in 2026. Here's how the attack works, in plain English.
  • AI Research for Builders: The Latest Breakthroughs, Explained Monthly
    A monthly digest of the latest AI research — agents, reasoning, efficiency, and models — with every claim traced to its source and translated into what it means if you build with AI.
  • 10 AI Research Breakthroughs That Matter for Builders (June 2026)
    The latest AI research, explained: AI disproved an 80-year-old math conjecture, agents got cheaper and more reliable, and inference costs dropped up to 100x. What each finding means if you build with AI.
  • Did Anthropic Call for an AI Pause? What It Actually Said
    Anthropic did not call to halt the AI boom. Here is what its June 2026 'recursive self-improvement' post actually said, why the 80%-of-its-own-code stat spooked it, and what it means if you build with Claude Code.

More from Handbook

  • Fundamentos do agente
    Cinco maneiras de criar agentes especializados no Código Claude: Sub-agentes de tarefas, .claude/agents YAML, comandos de barra personalizados, personas CLAUDE.md e prompts de perspetiva.
  • Engenharia de Harness para Agentes
    O harness é cada camada ao redor do seu agente de IA, exceto o modelo em si. Aprenda os cinco pontos de controle, o paradoxo das restrições, e por que o design do harness determina o desempenho do agente mais do que o modelo.
  • Padrões de Agentes
    Orchestrator, fan-out, cadeia de validação, routing especializado, refinamento progressivo e watchdog. Seis formas de orquestração para ligar sub-agentes no Claude Code.
  • Boas Práticas para Equipas de Agentes
    Padrões testados em produção para Equipas de Agentes Claude Code. Prompts de criação ricos em contexto, tarefas bem dimensionadas, posse de ficheiros, modo delegado, e correções das versões v2.1.33-v2.1.45.

Pare de configurar. Comece a construir.

Templates SaaS com orquestração de IA.

How Does an LLM Actually Work? (ChatGPT and Claude, Explained Without Math)

A large language model is a next-word prediction machine run billions of times. Here's how ChatGPT and Claude actually work — tokens, training, and attention — explained in plain English, no math.

How Do AI Agents Actually Work? (The Loop That Lets AI Do Things)

An AI agent is a language model put in a loop and given tools and a goal, so it can take actions instead of just chatting. Here's the plan-act-observe loop that powers agentic AI, explained simply.

On this page

Table of Contents
The Core Idea: Sculpting Away Noise
How It Learned: Add Noise, Then Reverse It
Where Your Words Come In
Why It Used to Mess Up Hands
Why the Same Prompt Gives Different Images
How AI Video Builds on This
Frequently Asked Questions
How does AI image generation actually work?
What is diffusion in AI?
Why does AI image generation get hands and text wrong?
Why do I get a different image each time with the same prompt?
Is AI image generation the same as ChatGPT?

Pare de configurar. Comece a construir.

Templates SaaS com orquestração de IA.