How Does an LLM Actually Work? (ChatGPT and Claude, Explained Without Math)

A large language model (LLM) like ChatGPT or Claude works by doing one simple thing astonishingly well: predicting the next word. You give it some text, and it guesses the most likely next chunk of text, adds it, then guesses again — over and over, a few words at a time, until the answer is complete. Everything else — the essays, the code, the eerily good advice — is that single trick, repeated billions of times by a system that has read most of the internet.

That sounds too simple to explain something that can write a poem or debug your code. The magic isn't in the trick; it's in how good a machine gets at the trick after reading nearly everything humans have written. Here's the whole thing, no math.

The One Job: Predict the Next Token
What's a Token?
How It Learned: Training in Three Phases
Why It Seems to "Understand" — Attention
Why It Confidently Makes Things Up
What an LLM Is Not
Frequently Asked Questions

The One Job: Predict the Next Token

Imagine the world's most well-read autocomplete. You type "The capital of France is," and it has seen that phrase followed by "Paris" so many times that "Paris" is overwhelmingly the most likely next word. So it writes "Paris."

An LLM does exactly this, but for any text. Ask it to write an email, and it predicts the most plausible next word given everything so far — the instruction, the tone, the words it has already written. It generates the response one small piece at a time, each piece feeding back in to inform the next. There's no separate "thinking" step hiding behind the words. The generating is the thinking.

The reason it's not just dumb autocomplete is the sheer scale of what it learned from. To predict the next word well across science, code, law, jokes, and recipes, it had to internalize patterns that look a lot like knowledge and reasoning.

What's a Token?

The model doesn't actually work in words — it works in tokens, which are chunks of text. A token might be a whole word ("cat"), part of a word ("ing"), or a piece of punctuation. On average, one token is about ¾ of an English word.

Two reasons this matters to you:

It's why AI is priced per token, and why a long document costs more to process. (Full breakdown in what is a token.)
It's why the model has a memory limit. The text it can "see" at once — your prompt plus its answer — is measured in tokens, called the context window. Run past it and the earliest text falls away. (See why AI forgets what you talked about.)

How It Learned: Training in Three Phases

A model isn't programmed with facts. It's trained — shown enormous amounts of text and adjusted until it gets good at prediction. This happens in three stages:

Phase	What happens	The result
1. Pre-training	Read a huge chunk of the internet, books, and code, predicting the next token trillions of times	Raw knowledge and language ability — but unfocused
2. Fine-tuning	Train on curated examples of good question-and-answer behavior	Learns to be a helpful assistant, not just an autocomplete
3. Alignment (RLHF)	Humans rate answers; the model is nudged toward the preferred ones	Learns to be helpful, honest, and safe

Pre-training is the giant, expensive marathon (thousands of GPUs for weeks — part of why AI uses so much energy). Fine-tuning and alignment are what turn a raw text-predictor into ChatGPT or Claude — the difference between a library and a librarian.

Why It Seems to "Understand" — Attention

The breakthrough that made modern LLMs possible is a mechanism called attention. When the model reads your text, attention lets each word "look at" every other word and decide which ones matter for what comes next.

Take: "The trophy didn't fit in the suitcase because it was too big." What does "it" refer to? You instantly know it's the trophy. Attention is how the model figures that out — it weighs the connection between "it" and every earlier word, and "trophy" wins. Do that across thousands of words and you get something that tracks context, references, and intent well enough to feel like understanding.

It isn't understanding the way you do — there's no inner life, no beliefs. But "a machine that has learned which words relate to which, across nearly everything ever written" is a genuinely powerful thing, and it explains most of what these models can do.

Why It Confidently Makes Things Up

Here's the catch that follows directly from the one job. The model is optimizing for plausible, not true. When it doesn't know something, it doesn't have a built-in "I'm not sure" signal — it just predicts the most likely-sounding next words, which can be a confidently-worded wrong answer. That's a hallucination, and it's not a bug bolted on; it's the flip side of being a fluent prediction machine.

We dig into the why in why ChatGPT makes stuff up and why AI sounds confident when it's wrong. The practical takeaway: trust it like a brilliant, fast, slightly overconfident intern — verify anything that matters.

What an LLM Is Not

A few myths worth killing:

It's not a database. It didn't store the internet; it learned patterns from it. It can't reliably "look up" an exact fact unless it's given tools to do so.
It's not conscious or thinking between messages. It's stateless — it does nothing until you send a prompt, then predicts, then stops.
It's not deterministic by default. Ask the same thing twice and you can get different wording, because it samples from the likely options rather than always picking the single top one.

Once the "next-token predictor trained on everything" picture clicks, almost every quirk of AI — the brilliance, the confidence, the hallucinations — starts to make sense. From here, the natural next steps are how AI image generation works (a different kind of model entirely) and how AI agents work (what happens when you give an LLM tools and a goal).

Frequently Asked Questions

How does an LLM actually work in simple terms?

An LLM predicts the next chunk of text (a "token") based on everything written so far, then repeats that prediction over and over to build a full response. It learned to do this by training on enormous amounts of text, which taught it the patterns of language, facts, and reasoning well enough to produce useful answers.

Is ChatGPT just predicting the next word?

Yes, fundamentally — but that undersells it. Predicting the next word well across every topic humans write about requires internalizing grammar, facts, and reasoning patterns. The simplicity of the mechanism plus the scale of the training is exactly what makes it powerful.

Does an LLM understand what it's saying?

Not the way humans do. It has no beliefs or inner experience. What looks like understanding is a learned model of how words and concepts relate, powered by a mechanism called attention. It's good enough to be genuinely useful, but it's pattern prediction, not comprehension.

Why do LLMs make mistakes or "hallucinate"?

Because they optimize for plausible-sounding text, not verified truth. When the model doesn't know something, it still predicts the most likely-sounding answer, which can be confidently wrong. Hallucination is a side effect of how the technology works, not a fixable glitch — so verify anything important.

What's the difference between an LLM and an AI agent?

An LLM is the text-prediction engine. An AI agent wraps that engine in a loop and gives it tools (search, code execution, APIs) plus a goal, so it can take actions and work toward an outcome instead of just answering once. See how AI agents work.