Build This Now
Build This Now
Was ist der Claude Code?Claude Code installierenClaude Code Native InstallerDein erstes Claude Code-Projekt
How Does an LLM Actually Work? (ChatGPT and Claude, Explained Without Math)How Does AI Image Generation Work? (The Noise-to-Picture Trick)How Do AI Agents Actually Work? (The Loop That Lets AI Do Things)What Is a Token in AI? (Why ChatGPT Charges by the Token)What Is a Vector Embedding? (And How RAG Lets AI Read Your Documents)How ChatGPT's 'Dreaming' Memory Works (and What to Turn Off)Why a Hidden Line of Text Can Hijack Your AI BrowserHow Much Energy and Water Does AI Actually Use?Is AI a Bubble? 'Circular Financing' in Plain EnglishThe EU AI Act, Explained: What Changes on August 2, 2026How Do AI Voice-Cloning Scams Work? (And How to Spot One)What Is Agentic Commerce? How AI Agents Buy Things for YouWhy Does AI Run on GPUs, Not CPUs? (One Genius vs. a Thousand Interns)How Does HTTPS Work? (The Padlock, and Why Nobody Can Read Your Password)
speedy_devvkoen_salo
Blog/Handbook/Core/How Does AI Image Generation Work? (The Noise-to-Picture Trick)

How Does AI Image Generation Work? (The Noise-to-Picture Trick)

AI image generators like Midjourney and DALL-E start with pure visual static and slowly remove the noise until a picture appears — guided by your words. Here's how diffusion actually works, explained simply.

Hören Sie auf zu konfigurieren. Fangen Sie an zu bauen.

SaaS-Builder-Vorlagen mit KI-Orchestrierung.

Published Jun 13, 20268 min readHandbook hubCore index

AI image generators work by starting with a screen of pure random static — like an untuned TV — and then removing the noise a little at a time until a picture emerges, with your text prompt steering what that picture becomes. It's closer to a sculptor revealing a statue inside a block of marble than to a painter adding strokes to a blank canvas. The technique is called diffusion, and once you see it, the whole thing makes sense — including why AI used to give everyone six fingers.

This is a completely different kind of model from the language models behind ChatGPT. LLMs predict text; diffusion models denoise images. Here's how the second one works.

Table of Contents

  1. The Core Idea: Sculpting Away Noise
  2. How It Learned: Add Noise, Then Reverse It
  3. Where Your Words Come In
  4. Why It Used to Mess Up Hands
  5. Why the Same Prompt Gives Different Images
  6. How AI Video Builds on This
  7. Frequently Asked Questions

Hören Sie auf zu konfigurieren. Fangen Sie an zu bauen.

SaaS-Builder-Vorlagen mit KI-Orchestrierung.

The Core Idea: Sculpting Away Noise

Picture a TV tuned to static — a screen of random colored dots. Now imagine a machine that looks at that static and asks, "If there were a picture of a cat hidden in here, which dots should I nudge to make it slightly more cat-like?" It makes a small adjustment. Then it asks again. And again — typically 20 to 50 times.

With each pass, the random noise gets a little more organized, a little more like the thing you asked for, until a clean image is sitting where the static used to be. That's diffusion: not painting a picture, but progressively denoising random static into one.

How It Learned: Add Noise, Then Reverse It

The clever part is how the model learned to do this. During training, it was shown millions of real images, and for each one it did the process backwards:

  1. Take a real photo (say, a dog).
  2. Gradually add noise to it, step by step, until it's pure static. The model watches this happen.
  3. Learn to undo each step — to predict "what did this look like one step less noisy?"

Do that across millions of images and the model becomes an expert at one thing: taking a noisy image and making it slightly cleaner. To generate a new image, you just start it at the end — pure noise — and let it run its cleanup process. Because it learned from real images, the "clean" version it heads toward looks like a real image too.

What the model seesWhat it learns
Training (backward)Real image → slowly add noise → staticHow to reverse one step of noise
Generating (forward)Start from pure static → slowly remove noise → imageProduces a brand-new image

Where Your Words Come In

Left alone, the model would denoise toward some plausible image, but not necessarily what you want. Your text prompt is the steering wheel.

The words "a red bicycle on a beach at sunset" get turned into numbers the model understands (the same kind of meaning-coordinates used in embeddings). At every denoising step, the model nudges the image not just toward "a realistic picture" but toward "a realistic picture that matches these words." More steps and stronger guidance pull the result closer to your prompt.

Why It Used to Mess Up Hands

The infamous six-fingered hands weren't random — they're a direct clue to how diffusion works. The model never learned "a hand has exactly five fingers" as a rule. It learned what hands tend to look like — pinkish, with several finger-shaped protrusions. Since it builds the image from blurry noise into detail, and hands appear in countless positions and counts in training photos, it often settled on "about the right number" of fingers rather than exactly five.

Modern models (2026) mostly fixed this with better training and more parameters — but the lesson holds: these models reproduce statistical patterns, not hard rules. They're brilliant at vibes, historically shaky on exact counts, text in images, and rigid geometry.

Why the Same Prompt Gives Different Images

Each generation starts from a different patch of random noise (a "seed"). Different starting static, denoised toward the same prompt, lands on a different final image — the same way two sculptors handed different marble blocks would carve slightly different statues of the same subject. Lock the seed and you can reproduce the exact image; change it and you get fresh variations.

How AI Video Builds on This

AI video (Sora, Veo, and others) extends diffusion across time: it denoises many frames at once while trying to keep them consistent from one to the next. That consistency is the hard part — and it's exactly why AI video sometimes flickers, morphs objects, or drifts in physics. The model is denoising each frame from noise and only approximately remembering what the last frame looked like. Those tiny inconsistencies are, conveniently, also how you can often spot an AI-generated clip.

Hören Sie auf zu konfigurieren. Fangen Sie an zu bauen.

SaaS-Builder-Vorlagen mit KI-Orchestrierung.

Frequently Asked Questions

How does AI image generation actually work?

It uses a technique called diffusion. The model starts with a field of random visual noise and removes that noise step by step — usually 20 to 50 times — nudging the image toward something realistic that matches your text prompt, until a finished picture emerges.

What is diffusion in AI?

Diffusion is the process of turning random noise into a coherent image by repeatedly "denoising" it. The model learned this by watching millions of real images get progressively corrupted into static, then learning to reverse each step. To make new images, it runs that reversal starting from pure noise.

Why does AI image generation get hands and text wrong?

Because the model learned statistical patterns of what things look like, not hard rules like "hands have five fingers" or how letters form words. It builds images from blurry to sharp, so exact counts, text, and rigid geometry are historically weak spots — though 2026 models have improved a lot.

Why do I get a different image each time with the same prompt?

Each run starts from a different patch of random noise, called a seed. Denoising different starting static toward the same prompt produces different final images. If you fix the seed, you can reproduce the exact same image.

Is AI image generation the same as ChatGPT?

No. ChatGPT is a language model that predicts text. Image generators use diffusion models that denoise images. They're different architectures for different jobs, though both turn your words into numbers to guide the output.

Continue in Core

  • 1M-Kontext-Fenster in Claude Code
    Anthropic hat das 1-Mio.-Token-Kontextfenster für Opus 4.6 und Sonnet 4.6 in Claude Code aktiviert. Kein Beta-Header, kein Aufpreis, feste Preise und weniger Kompaktierungen.
  • AGENTS.md vs CLAUDE.md erklärt
    Zwei Kontext-Dateien, eine Codebase. Wie AGENTS.md und CLAUDE.md sich unterscheiden, was jede macht und wie du beide nutzt, ohne etwas zu duplizieren.
  • Why a Hidden Line of Text Can Hijack Your AI Browser
    AI browsers read the whole web page — including text hidden from you. That's the door behind prompt injection, OWASP's #1 AI security risk in 2026. Here's how the attack works, in plain English.
  • AI Research for Builders: The Latest Breakthroughs, Explained Monthly
    A monthly digest of the latest AI research — agents, reasoning, efficiency, and models — with every claim traced to its source and translated into what it means if you build with AI.
  • 10 AI Research Breakthroughs That Matter for Builders (June 2026)
    The latest AI research, explained: AI disproved an 80-year-old math conjecture, agents got cheaper and more reliable, and inference costs dropped up to 100x. What each finding means if you build with AI.
  • Did Anthropic Call for an AI Pause? What It Actually Said
    Anthropic did not call to halt the AI boom. Here is what its June 2026 'recursive self-improvement' post actually said, why the 80%-of-its-own-code stat spooked it, and what it means if you build with Claude Code.

More from Handbook

  • Grundlagen für Agenten
    Fünf Möglichkeiten, spezialisierte Agenten in Claude Code zu erstellen: Aufgaben-Unteragenten, .claude/agents YAML, benutzerdefinierte Slash-Befehle, CLAUDE.md Personas und perspektivische Aufforderungen.
  • Agent-Harness-Engineering
    Der Harness ist jede Schicht rund um deinen KI-Agenten, außer dem Modell selbst. Lern die fünf Steuerungshebel, das Constraint-Paradoxon und warum das Harness-Design die Performance des Agenten mehr bestimmt als das Modell.
  • Agenten-Muster
    Orchestrator, Fan-out, Validierungskette, Spezialistenrouting, Progressive Verfeinerung und Watchdog. Sechs Orchestrierungsformen, um Claude Code Sub-Agenten zu verdrahten.
  • Agent Teams Best Practices
    Bewährte Muster für Claude Code Agent Teams. Kontextreiche Spawn-Prompts, richtig bemessene Aufgaben, Datei-Eigentümerschaft, Delegate-Modus und Fixes für v2.1.33-v2.1.45.

Hören Sie auf zu konfigurieren. Fangen Sie an zu bauen.

SaaS-Builder-Vorlagen mit KI-Orchestrierung.

How Does an LLM Actually Work? (ChatGPT and Claude, Explained Without Math)

A large language model is a next-word prediction machine run billions of times. Here's how ChatGPT and Claude actually work — tokens, training, and attention — explained in plain English, no math.

How Do AI Agents Actually Work? (The Loop That Lets AI Do Things)

An AI agent is a language model put in a loop and given tools and a goal, so it can take actions instead of just chatting. Here's the plan-act-observe loop that powers agentic AI, explained simply.

On this page

Table of Contents
The Core Idea: Sculpting Away Noise
How It Learned: Add Noise, Then Reverse It
Where Your Words Come In
Why It Used to Mess Up Hands
Why the Same Prompt Gives Different Images
How AI Video Builds on This
Frequently Asked Questions
How does AI image generation actually work?
What is diffusion in AI?
Why does AI image generation get hands and text wrong?
Why do I get a different image each time with the same prompt?
Is AI image generation the same as ChatGPT?

Hören Sie auf zu konfigurieren. Fangen Sie an zu bauen.

SaaS-Builder-Vorlagen mit KI-Orchestrierung.