Build This Now
Build This Now
What Is Claude Code?Claude Code InstallationClaude Code Native InstallerYour First Claude Code Project
speedy_devvkoen_salo
Blog/Handbook/Core/What Is a Token in AI? (Why ChatGPT Charges by the Token)

What Is a Token in AI? (Why ChatGPT Charges by the Token)

A token is a chunk of text — roughly ¾ of a word — and it's the unit AI models read, generate, remember, and bill by. Here's what a token actually is and why it controls your AI costs and limits.

Stop configuring. Start building.

SaaS builder templates with AI orchestration.

Published Jun 13, 20267 min readHandbook hubCore index

A token is a chunk of text — on average about ¾ of an English word — and it's the basic unit an AI model reads, writes, remembers, and charges for. The model never sees "words" or "letters" the way you do; it sees a stream of tokens. That single fact explains three things people find confusing about AI: why it's priced per token, why it has a memory limit, and why a wall of text can suddenly get expensive or truncated.

If you use AI tools — and especially if you build with them — understanding tokens is the difference between guessing at your costs and controlling them.

Table of Contents

  1. What a Token Actually Is
  2. Why Not Just Use Words?
  3. Tokens Are the Unit of Memory
  4. Tokens Are the Unit of Money
  5. How to Use Fewer Tokens
  6. Frequently Asked Questions

Stop configuring. Start building.

SaaS builder templates with AI orchestration.

What a Token Actually Is

When you send text to an AI, the first thing that happens is tokenization: the text gets sliced into tokens. A token can be:

  • a whole common word — cat, the, house
  • a piece of a longer or rarer word — token might split into tok + en
  • a space-plus-word — running is often one token
  • a punctuation mark — . or ,

A rough rule of thumb: 1 token ≈ ¾ of a word, or about 4 characters of English. So 1,000 tokens is roughly 750 words. (Other languages and code tokenize differently — often less efficiently.)

The model reads these tokens, and when it answers, it generates one token at a time, each one predicted from all the tokens before it.

Why Not Just Use Words?

Two reasons tokens beat whole words:

  1. Vocabulary size. There are millions of possible words (plus typos, names, and code). A fixed set of ~100,000 tokens can build any of them by combining pieces — like an alphabet that's bigger than 26 letters but smaller than every word.
  2. Rare and new words. The model has never seen your username or a brand-new term, but it can still handle them by breaking them into familiar sub-pieces. Nothing is truly "out of vocabulary."

This is why a made-up word like "antidisestablishmentarianism" costs several tokens while "dog" costs one.

Tokens Are the Unit of Memory

Every model has a context window: the maximum number of tokens it can consider at once, counting both your input and its output. In 2026, frontier models like Claude Opus and GPT-5.5 offer context windows around 1 million tokens — roughly a 750,000-word library in a single conversation.

But it's still a limit. When a conversation exceeds the window, the oldest tokens fall off the edge — which is why AI seems to forget what you said earlier. Everything the model "knows" in the moment has to fit in that token budget.

Tokens Are the Unit of Money

AI APIs bill per token, usually with two different prices:

What it isTypically
Input tokensThe text you send (prompt, documents, history)Cheaper
Output tokensThe text the model generates backMore expensive

So cost scales with how much text goes in and comes out. Pasting a 50-page document into every request, or keeping a giant chat history, quietly runs up the bill because all of it is re-sent as input tokens each turn. This is the single biggest lever on what AI actually costs you. (We go deep on this in cut Claude Code token costs.)

It's also why 2026 efficiency research matters so much: methods that compress how tokens are stored and processed cut the real cost per query. (See the research digest.)

How to Use Fewer Tokens

Practical ways to keep token usage — and cost — down:

  • Send only what's needed. Don't paste an entire document if a relevant section will do.
  • Trim history. Long chat threads re-send everything each turn; start fresh when the topic changes.
  • Be concise in prompts. Clear and short beats long and rambling — and costs less.
  • Use the right-sized model. A smaller, cheaper model often handles routine tasks fine; save the expensive one for hard problems.
  • For documents, use retrieval. Instead of stuffing everything in, fetch only the relevant chunks — that's what RAG and embeddings do.

Stop configuring. Start building.

SaaS builder templates with AI orchestration.

Frequently Asked Questions

What is a token in AI, simply?

A token is a chunk of text — on average about ¾ of a word — that an AI model uses as its basic unit. The model reads your text as a sequence of tokens and generates its response one token at a time. Tokens are also how AI usage is measured and billed.

How many words is a token?

Roughly: 1 token ≈ ¾ of an English word, or about 4 characters. So 1,000 tokens is around 750 words. Code and non-English languages often use more tokens for the same content because they tokenize less efficiently.

Why does ChatGPT charge per token?

Because tokens are the actual unit of work the model processes. Both the text you send (input tokens) and the text it generates (output tokens) consume computation, so pricing per token directly reflects cost. Output tokens usually cost more than input tokens.

What is a context window?

It's the maximum number of tokens a model can consider at once, including both your input and its output. In 2026, top models reach around 1 million tokens. When a conversation exceeds the window, the oldest tokens drop off — which is why AI can forget earlier parts of a long chat.

How do I reduce my AI token costs?

Send only the text that's needed, trim long conversation histories, write concise prompts, use a smaller model for routine tasks, and use retrieval (RAG) to fetch only relevant document chunks instead of pasting everything into the prompt.

Continue in Core

  • 1M Context Window in Claude Code
    Anthropic flipped the 1M token context window on for Opus 4.6 and Sonnet 4.6 in Claude Code. No beta header, no surcharge, flat pricing, and fewer compactions.
  • AGENTS.md vs CLAUDE.md Explained
    Two context files, one codebase. How AGENTS.md and CLAUDE.md differ, what each one does, and how to use both without duplicating anything.
  • Why a Hidden Line of Text Can Hijack Your AI Browser
    AI browsers read the whole web page — including text hidden from you. That's the door behind prompt injection, OWASP's #1 AI security risk in 2026. Here's how the attack works, in plain English.
  • AI Research for Builders: The Latest Breakthroughs, Explained Monthly
    A monthly digest of the latest AI research — agents, reasoning, efficiency, and models — with every claim traced to its source and translated into what it means if you build with AI.
  • 10 AI Research Breakthroughs That Matter for Builders (June 2026)
    The latest AI research, explained: AI disproved an 80-year-old math conjecture, agents got cheaper and more reliable, and inference costs dropped up to 100x. What each finding means if you build with AI.
  • Did Anthropic Call for an AI Pause? What It Actually Said
    Anthropic did not call to halt the AI boom. Here is what its June 2026 'recursive self-improvement' post actually said, why the 80%-of-its-own-code stat spooked it, and what it means if you build with Claude Code.

More from Handbook

  • Agent Fundamentals
    Five ways to build specialist agents in Claude Code: Task sub-agents, .claude/agents YAML, custom slash commands, CLAUDE.md personas, and perspective prompts.
  • Agent Harness Engineering
    The harness is every layer around your AI agent except the model itself. Learn the five control levers, the constraint paradox, and why harness design determines agent performance more than the model does.
  • Agent Patterns
    Orchestrator, fan-out, validation chain, specialist routing, progressive refinement, and watchdog. Six orchestration shapes to wire Claude Code sub-agents with.
  • Agent Teams Best Practices
    Battle-tested patterns for Claude Code Agent Teams. Context-rich spawn prompts, right-sized tasks, file ownership, delegate mode, and v2.1.33-v2.1.45 fixes.

Stop configuring. Start building.

SaaS builder templates with AI orchestration.

On this page

Table of Contents
What a Token Actually Is
Why Not Just Use Words?
Tokens Are the Unit of Memory
Tokens Are the Unit of Money
How to Use Fewer Tokens
Frequently Asked Questions
What is a token in AI, simply?
How many words is a token?
Why does ChatGPT charge per token?
What is a context window?
How do I reduce my AI token costs?

Stop configuring. Start building.

SaaS builder templates with AI orchestration.