Token Counter
Count tokens for any LLM.
Real-time token counting for 17 models: GPT-4.1, Claude, Gemini, DeepSeek, and more. No signup, no API key.
Model Comparison
| Model | Context | Input / 1M | Output / 1M |
|---|---|---|---|
GPT-4.1OpenAI | 1M | $2 | $8 |
GPT-4.1 MiniOpenAI | 1M | $0.4 | $1.6 |
GPT-4.1 NanoOpenAI | 1M | $0.1 | $0.4 |
GPT-4oOpenAI | 128K | $2.5 | $10 |
GPT-4o MiniOpenAI | 128K | $0.15 | $0.6 |
o3OpenAI | 200K | $10 | $40 |
o4-miniOpenAI | 200K | $1.1 | $4.4 |
Claude Opus 4.7Anthropic | 200K | $15 | $75 |
Claude Sonnet 4.6Anthropic | 200K | $3 | $15 |
Claude Haiku 4.5Anthropic | 200K | $0.8 | $4 |
Gemini 2.5 ProGoogle | 1M | $1.25 | $10 |
Gemini 2.5 FlashGoogle | 1M | $0.3 | $1.5 |
Gemini 2.0 FlashGoogle | 1M | $0.1 | $0.4 |
Llama 4 ScoutMeta | 10M | Free | Free |
Llama 4 MaverickMeta | 1M | Free | Free |
Pricing as of April 2026. Click any row to select that model. Verify current rates at each provider's pricing page before estimating production costs.
FAQ
Common questions.
How many tokens is 1,000 words?
About 1,333 tokens for standard English prose. One token equals roughly 0.75 words or 4 characters. Code, JSON, and technical text tokenize differently.
What is a token in AI?
A token is the smallest chunk of text an AI model processes. GPT-4o breaks text into subword pieces using BPE. "unbelievable" might be three tokens: "un", "belie", "vable".
What is the token limit for GPT-4.1?
GPT-4.1 has a 1,000,000-token context window. GPT-4o supports 128,000 tokens. Both limits cover combined input and output.
Is Claude cheaper than GPT-4o per token?
Gemini 2.5 Flash is cheapest at $0.30 per million input tokens. Claude Haiku 4.5 is $1. GPT-4o is $2.50. GPT-4.1 Mini ($0.40) or Claude Haiku 4.5 give the best quality-to-cost ratio.
How do I count tokens without calling the API?
Use this tool. It runs entirely in your browser. No API key needed. No data leaves your device. OpenAI counts are exact using the same tiktoken-compatible library.
Why does the same text have different token counts across models?
OpenAI's o200k_base and cl100k_base produce different counts for the same text. Claude uses Anthropic's tokenizer. Gemini uses SentencePiece. Each splits words at different boundaries.
What is a context window in AI?
The context window is the total number of tokens a model can process at once, input plus output combined. GPT-4o's 128K context fits about 96,000 words.
How do I reduce my AI API costs?
Use a smaller model. GPT-4.1 Nano costs 95% less than GPT-4o. Trim your system prompt. Use prompt caching for repeated context. Batch non-urgent requests.
What is BPE tokenization?
Byte Pair Encoding starts with individual characters, then merges the most common pairs into single tokens. Common words like "the" become one token. Rare terms get split into multiple tokens.
Which AI model has the largest context window?
Llama 4 Scout has a 10,000,000-token context window. Gemini 1.5 Pro supports 2,000,000 tokens. GPT-4.1 and Claude Sonnet 4.6 both support 1,000,000 tokens.
How AI Tokenization Works
AI models don't read words. They read tokens. A token is a chunk of text, typically 3-4 characters for English prose. "unbelievable" becomes three tokens: "un", "belie", "vable". Common words like "the" are a single token. Numbers, punctuation, and whitespace each become their own tokens.
OpenAI's GPT-4o and GPT-4.1 use the o200k_base tokenizer with a 200,000-token vocabulary. GPT-4 used cl100k_base with 100,000 tokens. Anthropic and Google use their own tokenizers, so the same text produces slightly different counts. That's why this tool shows exact counts for OpenAI models and estimates for Claude and Gemini.
Context windows matter because every API call has a hard limit on combined input and output tokens. GPT-4o's 128,000-token context fits about 96,000 English words. If your prompt plus expected output exceeds that limit, the model truncates or errors. Llama 4 Scout's 10 million token context window can fit entire codebases.