Skip to main content
Home/Tools/Token Counter

Token Counter

Count tokens for any LLM.

Real-time token counting for 17 models: GPT-4.1, Claude, Gemini, DeepSeek, and more. No signup, no API key.

Context window0 / 128K tokens (< 0.01%)
OpenAI·128K ctx·Exact
Tokens
Words
Characters
Chars / token
Cost estimate
Input cost
Output (same length)
$2.5/M input · $10/M output

Model Comparison

ModelContextInput / 1MOutput / 1M
GPT-4.1OpenAI
1M$2$8
GPT-4.1 MiniOpenAI
1M$0.4$1.6
GPT-4.1 NanoOpenAI
1M$0.1$0.4
GPT-4oOpenAI
128K$2.5$10
GPT-4o MiniOpenAI
128K$0.15$0.6
o3OpenAI
200K$10$40
o4-miniOpenAI
200K$1.1$4.4
Claude Opus 4.7Anthropic
200K$15$75
Claude Sonnet 4.6Anthropic
200K$3$15
Claude Haiku 4.5Anthropic
200K$0.8$4
Gemini 2.5 ProGoogle
1M$1.25$10
Gemini 2.5 FlashGoogle
1M$0.3$1.5
Gemini 2.0 FlashGoogle
1M$0.1$0.4
Llama 4 ScoutMeta
10MFreeFree
Llama 4 MaverickMeta
1MFreeFree

Pricing as of April 2026. Click any row to select that model. Verify current rates at each provider's pricing page before estimating production costs.

FAQ

Common questions.

How many tokens is 1,000 words?

About 1,333 tokens for standard English prose. One token equals roughly 0.75 words or 4 characters. Code, JSON, and technical text tokenize differently.

What is a token in AI?

A token is the smallest chunk of text an AI model processes. GPT-4o breaks text into subword pieces using BPE. "unbelievable" might be three tokens: "un", "belie", "vable".

What is the token limit for GPT-4.1?

GPT-4.1 has a 1,000,000-token context window. GPT-4o supports 128,000 tokens. Both limits cover combined input and output.

Is Claude cheaper than GPT-4o per token?

Gemini 2.5 Flash is cheapest at $0.30 per million input tokens. Claude Haiku 4.5 is $1. GPT-4o is $2.50. GPT-4.1 Mini ($0.40) or Claude Haiku 4.5 give the best quality-to-cost ratio.

How do I count tokens without calling the API?

Use this tool. It runs entirely in your browser. No API key needed. No data leaves your device. OpenAI counts are exact using the same tiktoken-compatible library.

Why does the same text have different token counts across models?

OpenAI's o200k_base and cl100k_base produce different counts for the same text. Claude uses Anthropic's tokenizer. Gemini uses SentencePiece. Each splits words at different boundaries.

What is a context window in AI?

The context window is the total number of tokens a model can process at once, input plus output combined. GPT-4o's 128K context fits about 96,000 words.

How do I reduce my AI API costs?

Use a smaller model. GPT-4.1 Nano costs 95% less than GPT-4o. Trim your system prompt. Use prompt caching for repeated context. Batch non-urgent requests.

What is BPE tokenization?

Byte Pair Encoding starts with individual characters, then merges the most common pairs into single tokens. Common words like "the" become one token. Rare terms get split into multiple tokens.

Which AI model has the largest context window?

Llama 4 Scout has a 10,000,000-token context window. Gemini 1.5 Pro supports 2,000,000 tokens. GPT-4.1 and Claude Sonnet 4.6 both support 1,000,000 tokens.

How AI Tokenization Works

AI models don't read words. They read tokens. A token is a chunk of text, typically 3-4 characters for English prose. "unbelievable" becomes three tokens: "un", "belie", "vable". Common words like "the" are a single token. Numbers, punctuation, and whitespace each become their own tokens.

OpenAI's GPT-4o and GPT-4.1 use the o200k_base tokenizer with a 200,000-token vocabulary. GPT-4 used cl100k_base with 100,000 tokens. Anthropic and Google use their own tokenizers, so the same text produces slightly different counts. That's why this tool shows exact counts for OpenAI models and estimates for Claude and Gemini.

Context windows matter because every API call has a hard limit on combined input and output tokens. GPT-4o's 128,000-token context fits about 96,000 English words. If your prompt plus expected output exceeds that limit, the model truncates or errors. Llama 4 Scout's 10 million token context window can fit entire codebases.