What Is a Token in AI? (Why ChatGPT Charges by the Token)

A token is a chunk of text — on average about ¾ of an English word — and it's the basic unit an AI model reads, writes, remembers, and charges for. The model never sees "words" or "letters" the way you do; it sees a stream of tokens. That single fact explains three things people find confusing about AI: why it's priced per token, why it has a memory limit, and why a wall of text can suddenly get expensive or truncated.

If you use AI tools — and especially if you build with them — understanding tokens is the difference between guessing at your costs and controlling them.

What a Token Actually Is
Why Not Just Use Words?
Tokens Are the Unit of Memory
Tokens Are the Unit of Money
How to Use Fewer Tokens
Frequently Asked Questions

What a Token Actually Is

When you send text to an AI, the first thing that happens is tokenization: the text gets sliced into tokens. A token can be:

a whole common word — cat, the, house
a piece of a longer or rarer word — token might split into tok + en
a space-plus-word — running is often one token
a punctuation mark — . or ,

A rough rule of thumb: 1 token ≈ ¾ of a word, or about 4 characters of English. So 1,000 tokens is roughly 750 words. (Other languages and code tokenize differently — often less efficiently.)

The model reads these tokens, and when it answers, it generates one token at a time, each one predicted from all the tokens before it.

Why Not Just Use Words?

Two reasons tokens beat whole words:

Vocabulary size. There are millions of possible words (plus typos, names, and code). A fixed set of ~100,000 tokens can build any of them by combining pieces — like an alphabet that's bigger than 26 letters but smaller than every word.
Rare and new words. The model has never seen your username or a brand-new term, but it can still handle them by breaking them into familiar sub-pieces. Nothing is truly "out of vocabulary."

This is why a made-up word like "antidisestablishmentarianism" costs several tokens while "dog" costs one.

Tokens Are the Unit of Memory

Every model has a context window: the maximum number of tokens it can consider at once, counting both your input and its output. In 2026, frontier models like Claude Opus and GPT-5.5 offer context windows around 1 million tokens — roughly a 750,000-word library in a single conversation.

But it's still a limit. When a conversation exceeds the window, the oldest tokens fall off the edge — which is why AI seems to forget what you said earlier. Everything the model "knows" in the moment has to fit in that token budget.

Tokens Are the Unit of Money

AI APIs bill per token, usually with two different prices:

	What it is	Typically
Input tokens	The text you send (prompt, documents, history)	Cheaper
Output tokens	The text the model generates back	More expensive

So cost scales with how much text goes in and comes out. Pasting a 50-page document into every request, or keeping a giant chat history, quietly runs up the bill because all of it is re-sent as input tokens each turn. This is the single biggest lever on what AI actually costs you. (We go deep on this in cut Claude Code token costs.)

It's also why 2026 efficiency research matters so much: methods that compress how tokens are stored and processed cut the real cost per query. (See the research digest.)

How to Use Fewer Tokens

Practical ways to keep token usage — and cost — down:

Send only what's needed. Don't paste an entire document if a relevant section will do.
Trim history. Long chat threads re-send everything each turn; start fresh when the topic changes.
Be concise in prompts. Clear and short beats long and rambling — and costs less.
Use the right-sized model. A smaller, cheaper model often handles routine tasks fine; save the expensive one for hard problems.
For documents, use retrieval. Instead of stuffing everything in, fetch only the relevant chunks — that's what RAG and embeddings do.

Frequently Asked Questions

What is a token in AI, simply?

A token is a chunk of text — on average about ¾ of a word — that an AI model uses as its basic unit. The model reads your text as a sequence of tokens and generates its response one token at a time. Tokens are also how AI usage is measured and billed.

How many words is a token?

Roughly: 1 token ≈ ¾ of an English word, or about 4 characters. So 1,000 tokens is around 750 words. Code and non-English languages often use more tokens for the same content because they tokenize less efficiently.

Why does ChatGPT charge per token?

Because tokens are the actual unit of work the model processes. Both the text you send (input tokens) and the text it generates (output tokens) consume computation, so pricing per token directly reflects cost. Output tokens usually cost more than input tokens.

What is a context window?

It's the maximum number of tokens a model can consider at once, including both your input and its output. In 2026, top models reach around 1 million tokens. When a conversation exceeds the window, the oldest tokens drop off — which is why AI can forget earlier parts of a long chat.

How do I reduce my AI token costs?

Send only the text that's needed, trim long conversation histories, write concise prompts, use a smaller model for routine tasks, and use retrieval (RAG) to fetch only relevant document chunks instead of pasting everything into the prompt.