Build This Now
Build This Now
クロード・コードとは何か?Claude Code のインストールClaude Code ネイティブインストーラーClaude Code で最初のプロジェクトを作る
How Does an LLM Actually Work? (ChatGPT and Claude, Explained Without Math)How Does AI Image Generation Work? (The Noise-to-Picture Trick)How Do AI Agents Actually Work? (The Loop That Lets AI Do Things)What Is a Token in AI? (Why ChatGPT Charges by the Token)What Is a Vector Embedding? (And How RAG Lets AI Read Your Documents)How ChatGPT's 'Dreaming' Memory Works (and What to Turn Off)Why a Hidden Line of Text Can Hijack Your AI BrowserHow Much Energy and Water Does AI Actually Use?Is AI a Bubble? 'Circular Financing' in Plain EnglishThe EU AI Act, Explained: What Changes on August 2, 2026How Do AI Voice-Cloning Scams Work? (And How to Spot One)What Is Agentic Commerce? How AI Agents Buy Things for YouWhy Does AI Run on GPUs, Not CPUs? (One Genius vs. a Thousand Interns)How Does HTTPS Work? (The Padlock, and Why Nobody Can Read Your Password)
speedy_devvkoen_salo
Blog/Handbook/Core/What Is a Token in AI? (Why ChatGPT Charges by the Token)

What Is a Token in AI? (Why ChatGPT Charges by the Token)

A token is a chunk of text — roughly ¾ of a word — and it's the unit AI models read, generate, remember, and bill by. Here's what a token actually is and why it controls your AI costs and limits.

設定をやめて、構築を始めよう。

AIオーケストレーション付きSaaSビルダーテンプレート。

Published Jun 13, 20267 min readHandbook hubCore index

A token is a chunk of text — on average about ¾ of an English word — and it's the basic unit an AI model reads, writes, remembers, and charges for. The model never sees "words" or "letters" the way you do; it sees a stream of tokens. That single fact explains three things people find confusing about AI: why it's priced per token, why it has a memory limit, and why a wall of text can suddenly get expensive or truncated.

If you use AI tools — and especially if you build with them — understanding tokens is the difference between guessing at your costs and controlling them.

Table of Contents

  1. What a Token Actually Is
  2. Why Not Just Use Words?
  3. Tokens Are the Unit of Memory
  4. Tokens Are the Unit of Money
  5. How to Use Fewer Tokens
  6. Frequently Asked Questions

設定をやめて、構築を始めよう。

AIオーケストレーション付きSaaSビルダーテンプレート。

What a Token Actually Is

When you send text to an AI, the first thing that happens is tokenization: the text gets sliced into tokens. A token can be:

  • a whole common word — cat, the, house
  • a piece of a longer or rarer word — token might split into tok + en
  • a space-plus-word — running is often one token
  • a punctuation mark — . or ,

A rough rule of thumb: 1 token ≈ ¾ of a word, or about 4 characters of English. So 1,000 tokens is roughly 750 words. (Other languages and code tokenize differently — often less efficiently.)

The model reads these tokens, and when it answers, it generates one token at a time, each one predicted from all the tokens before it.

Why Not Just Use Words?

Two reasons tokens beat whole words:

  1. Vocabulary size. There are millions of possible words (plus typos, names, and code). A fixed set of ~100,000 tokens can build any of them by combining pieces — like an alphabet that's bigger than 26 letters but smaller than every word.
  2. Rare and new words. The model has never seen your username or a brand-new term, but it can still handle them by breaking them into familiar sub-pieces. Nothing is truly "out of vocabulary."

This is why a made-up word like "antidisestablishmentarianism" costs several tokens while "dog" costs one.

Tokens Are the Unit of Memory

Every model has a context window: the maximum number of tokens it can consider at once, counting both your input and its output. In 2026, frontier models like Claude Opus and GPT-5.5 offer context windows around 1 million tokens — roughly a 750,000-word library in a single conversation.

But it's still a limit. When a conversation exceeds the window, the oldest tokens fall off the edge — which is why AI seems to forget what you said earlier. Everything the model "knows" in the moment has to fit in that token budget.

Tokens Are the Unit of Money

AI APIs bill per token, usually with two different prices:

What it isTypically
Input tokensThe text you send (prompt, documents, history)Cheaper
Output tokensThe text the model generates backMore expensive

So cost scales with how much text goes in and comes out. Pasting a 50-page document into every request, or keeping a giant chat history, quietly runs up the bill because all of it is re-sent as input tokens each turn. This is the single biggest lever on what AI actually costs you. (We go deep on this in cut Claude Code token costs.)

It's also why 2026 efficiency research matters so much: methods that compress how tokens are stored and processed cut the real cost per query. (See the research digest.)

How to Use Fewer Tokens

Practical ways to keep token usage — and cost — down:

  • Send only what's needed. Don't paste an entire document if a relevant section will do.
  • Trim history. Long chat threads re-send everything each turn; start fresh when the topic changes.
  • Be concise in prompts. Clear and short beats long and rambling — and costs less.
  • Use the right-sized model. A smaller, cheaper model often handles routine tasks fine; save the expensive one for hard problems.
  • For documents, use retrieval. Instead of stuffing everything in, fetch only the relevant chunks — that's what RAG and embeddings do.

設定をやめて、構築を始めよう。

AIオーケストレーション付きSaaSビルダーテンプレート。

Frequently Asked Questions

What is a token in AI, simply?

A token is a chunk of text — on average about ¾ of a word — that an AI model uses as its basic unit. The model reads your text as a sequence of tokens and generates its response one token at a time. Tokens are also how AI usage is measured and billed.

How many words is a token?

Roughly: 1 token ≈ ¾ of an English word, or about 4 characters. So 1,000 tokens is around 750 words. Code and non-English languages often use more tokens for the same content because they tokenize less efficiently.

Why does ChatGPT charge per token?

Because tokens are the actual unit of work the model processes. Both the text you send (input tokens) and the text it generates (output tokens) consume computation, so pricing per token directly reflects cost. Output tokens usually cost more than input tokens.

What is a context window?

It's the maximum number of tokens a model can consider at once, including both your input and its output. In 2026, top models reach around 1 million tokens. When a conversation exceeds the window, the oldest tokens drop off — which is why AI can forget earlier parts of a long chat.

How do I reduce my AI token costs?

Send only the text that's needed, trim long conversation histories, write concise prompts, use a smaller model for routine tasks, and use retrieval (RAG) to fetch only relevant document chunks instead of pasting everything into the prompt.

Continue in Core

  • Claude Codeにおける100万トークンコンテキストウィンドウ
    AnthropicはClaude CodeのOpus 4.6とSonnet 4.6に対して100万トークンのコンテキストウィンドウを有効化した。ベータヘッダー不要、追加料金なし、定額料金、そして圧縮の削減。
  • AGENTS.md vs CLAUDE.md 解説
    2つのコンテキストファイル、1つのコードベース。AGENTS.mdとCLAUDE.mdの違い、それぞれが何をするか、重複なしに両方を使う方法を解説します。
  • Why a Hidden Line of Text Can Hijack Your AI Browser
    AI browsers read the whole web page — including text hidden from you. That's the door behind prompt injection, OWASP's #1 AI security risk in 2026. Here's how the attack works, in plain English.
  • AI Research for Builders: The Latest Breakthroughs, Explained Monthly
    A monthly digest of the latest AI research — agents, reasoning, efficiency, and models — with every claim traced to its source and translated into what it means if you build with AI.
  • 10 AI Research Breakthroughs That Matter for Builders (June 2026)
    The latest AI research, explained: AI disproved an 80-year-old math conjecture, agents got cheaper and more reliable, and inference costs dropped up to 100x. What each finding means if you build with AI.
  • Did Anthropic Call for an AI Pause? What It Actually Said
    Anthropic did not call to halt the AI boom. Here is what its June 2026 'recursive self-improvement' post actually said, why the 80%-of-its-own-code stat spooked it, and what it means if you build with Claude Code.

More from Handbook

  • エージェントの基礎
    Claude Codeでスペシャリストエージェントを構築する5つの方法:タスクサブエージェント、.claude/agents YAML、カスタムスラッシュコマンド、CLAUDE.mdペルソナ、パースペクティブプロンプト。
  • エージェント・ハーネス・エンジニアリング
    ハーネスとは、AIエージェントを構成するモデル以外のすべての層のことです。5つの制御レバー、制約のパラドックス、そしてなぜハーネス設計がモデルよりもエージェントのパフォーマンスを左右するのかを学びましょう。
  • エージェントパターン
    オーケストレーター、ファンアウト、バリデーションチェーン、スペシャリストルーティング、プログレッシブリファインメント、ウォッチドッグ。Claude Code のサブエージェントを組み合わせる6つのオーケストレーション形状。
  • エージェントチームのベストプラクティス
    Claude Code エージェントチームの実証済みパターン。コンテキストが豊富なスポーンプロンプト、適切なサイズのタスク、ファイルオーナーシップ、デリゲートモード、v2.1.33〜v2.1.45 の修正内容。

設定をやめて、構築を始めよう。

AIオーケストレーション付きSaaSビルダーテンプレート。

How Do AI Agents Actually Work? (The Loop That Lets AI Do Things)

An AI agent is a language model put in a loop and given tools and a goal, so it can take actions instead of just chatting. Here's the plan-act-observe loop that powers agentic AI, explained simply.

What Is a Vector Embedding? (And How RAG Lets AI Read Your Documents)

A vector embedding turns words into coordinates on a map of meaning, so AI can find things by what they mean, not just by keyword. Here's how embeddings and RAG let AI answer questions about your own documents.

On this page

Table of Contents
What a Token Actually Is
Why Not Just Use Words?
Tokens Are the Unit of Memory
Tokens Are the Unit of Money
How to Use Fewer Tokens
Frequently Asked Questions
What is a token in AI, simply?
How many words is a token?
Why does ChatGPT charge per token?
What is a context window?
How do I reduce my AI token costs?

設定をやめて、構築を始めよう。

AIオーケストレーション付きSaaSビルダーテンプレート。