What Is a Token in AI? (Why ChatGPT Charges by the Token)
A token is a chunk of text — roughly ¾ of a word — and it's the unit AI models read, generate, remember, and bill by. Here's what a token actually is and why it controls your AI costs and limits.
Arrête de tout configurer. Place à la construction.
Des templates SaaS avec orchestration IA.
A token is a chunk of text — on average about ¾ of an English word — and it's the basic unit an AI model reads, writes, remembers, and charges for. The model never sees "words" or "letters" the way you do; it sees a stream of tokens. That single fact explains three things people find confusing about AI: why it's priced per token, why it has a memory limit, and why a wall of text can suddenly get expensive or truncated.
If you use AI tools — and especially if you build with them — understanding tokens is the difference between guessing at your costs and controlling them.
Table of Contents
- What a Token Actually Is
- Why Not Just Use Words?
- Tokens Are the Unit of Memory
- Tokens Are the Unit of Money
- How to Use Fewer Tokens
- Frequently Asked Questions
Arrête de tout configurer. Place à la construction.
Des templates SaaS avec orchestration IA.
What a Token Actually Is
When you send text to an AI, the first thing that happens is tokenization: the text gets sliced into tokens. A token can be:
- a whole common word —
cat,the,house - a piece of a longer or rarer word —
tokenmight split intotok+en - a space-plus-word —
runningis often one token - a punctuation mark —
.or,
A rough rule of thumb: 1 token ≈ ¾ of a word, or about 4 characters of English. So 1,000 tokens is roughly 750 words. (Other languages and code tokenize differently — often less efficiently.)
The model reads these tokens, and when it answers, it generates one token at a time, each one predicted from all the tokens before it.
Why Not Just Use Words?
Two reasons tokens beat whole words:
- Vocabulary size. There are millions of possible words (plus typos, names, and code). A fixed set of ~100,000 tokens can build any of them by combining pieces — like an alphabet that's bigger than 26 letters but smaller than every word.
- Rare and new words. The model has never seen your username or a brand-new term, but it can still handle them by breaking them into familiar sub-pieces. Nothing is truly "out of vocabulary."
This is why a made-up word like "antidisestablishmentarianism" costs several tokens while "dog" costs one.
Tokens Are the Unit of Memory
Every model has a context window: the maximum number of tokens it can consider at once, counting both your input and its output. In 2026, frontier models like Claude Opus and GPT-5.5 offer context windows around 1 million tokens — roughly a 750,000-word library in a single conversation.
But it's still a limit. When a conversation exceeds the window, the oldest tokens fall off the edge — which is why AI seems to forget what you said earlier. Everything the model "knows" in the moment has to fit in that token budget.
Tokens Are the Unit of Money
AI APIs bill per token, usually with two different prices:
| What it is | Typically | |
|---|---|---|
| Input tokens | The text you send (prompt, documents, history) | Cheaper |
| Output tokens | The text the model generates back | More expensive |
So cost scales with how much text goes in and comes out. Pasting a 50-page document into every request, or keeping a giant chat history, quietly runs up the bill because all of it is re-sent as input tokens each turn. This is the single biggest lever on what AI actually costs you. (We go deep on this in cut Claude Code token costs.)
It's also why 2026 efficiency research matters so much: methods that compress how tokens are stored and processed cut the real cost per query. (See the research digest.)
How to Use Fewer Tokens
Practical ways to keep token usage — and cost — down:
- Send only what's needed. Don't paste an entire document if a relevant section will do.
- Trim history. Long chat threads re-send everything each turn; start fresh when the topic changes.
- Be concise in prompts. Clear and short beats long and rambling — and costs less.
- Use the right-sized model. A smaller, cheaper model often handles routine tasks fine; save the expensive one for hard problems.
- For documents, use retrieval. Instead of stuffing everything in, fetch only the relevant chunks — that's what RAG and embeddings do.
Arrête de tout configurer. Place à la construction.
Des templates SaaS avec orchestration IA.
Frequently Asked Questions
What is a token in AI, simply?
A token is a chunk of text — on average about ¾ of a word — that an AI model uses as its basic unit. The model reads your text as a sequence of tokens and generates its response one token at a time. Tokens are also how AI usage is measured and billed.
How many words is a token?
Roughly: 1 token ≈ ¾ of an English word, or about 4 characters. So 1,000 tokens is around 750 words. Code and non-English languages often use more tokens for the same content because they tokenize less efficiently.
Why does ChatGPT charge per token?
Because tokens are the actual unit of work the model processes. Both the text you send (input tokens) and the text it generates (output tokens) consume computation, so pricing per token directly reflects cost. Output tokens usually cost more than input tokens.
What is a context window?
It's the maximum number of tokens a model can consider at once, including both your input and its output. In 2026, top models reach around 1 million tokens. When a conversation exceeds the window, the oldest tokens drop off — which is why AI can forget earlier parts of a long chat.
How do I reduce my AI token costs?
Send only the text that's needed, trim long conversation histories, write concise prompts, use a smaller model for routine tasks, and use retrieval (RAG) to fetch only relevant document chunks instead of pasting everything into the prompt.
Arrête de tout configurer. Place à la construction.
Des templates SaaS avec orchestration IA.
How Do AI Agents Actually Work? (The Loop That Lets AI Do Things)
An AI agent is a language model put in a loop and given tools and a goal, so it can take actions instead of just chatting. Here's the plan-act-observe loop that powers agentic AI, explained simply.
What Is a Vector Embedding? (And How RAG Lets AI Read Your Documents)
A vector embedding turns words into coordinates on a map of meaning, so AI can find things by what they mean, not just by keyword. Here's how embeddings and RAG let AI answer questions about your own documents.