Question 1

How many tokens is 1,000 words?

Accepted Answer

About 1,333 tokens for standard English prose. One token equals roughly 0.75 words or 4 characters. Code, JSON, and technical text tokenize differently — often more tokens per word because of special characters and short subwords.

Question 2

What is a token in AI?

Accepted Answer

A token is the smallest chunk of text an AI model processes. GPT-4o breaks text into subword pieces using BPE (Byte Pair Encoding). "unbelievable" might be three tokens: "un", "belie", "vable". Numbers, punctuation, and spaces each become their own tokens.

Question 3

What is the token limit for GPT-4.1?

Accepted Answer

GPT-4.1 has a 1,000,000-token context window. GPT-4o supports 128,000 tokens. Both limits cover combined input and output — if you send 120,000 tokens of input to GPT-4o, your output can't exceed 8,000 tokens before hitting the cap.

Question 4

Is Claude cheaper than GPT-4o per token?

Accepted Answer

Gemini 2.5 Flash is the cheapest at $0.30 per million input tokens. Claude Haiku 4.5 is $1. GPT-4o is $2.50. For most production chatbots, GPT-4.1 Mini ($0.40) or Claude Haiku 4.5 ($1) give the best quality-to-cost ratio.

Question 5

How do I count tokens without calling the API?

Accepted Answer

Use this token counter — it runs entirely in your browser. No API key needed. No data leaves your device. For OpenAI models, counts are exact using the same tiktoken-compatible library. For Claude and Gemini, counts are estimated within 2-3% accuracy.

Question 6

Why does the same text have different token counts across models?

Accepted Answer

OpenAI's o200k_base tokenizer (GPT-4o, GPT-4.1) and cl100k_base (GPT-4, GPT-3.5) produce different counts for the same text. Claude uses Anthropic's own tokenizer. Gemini uses SentencePiece. Each was trained differently, so the vocabulary splits words at different boundaries.

Question 7

What is a context window in AI?

Accepted Answer

The context window is the total number of tokens a model can process at once — input plus output combined. GPT-4o's 128K context fits about 96,000 words. Llama 4 Scout has the largest context window at 10 million tokens.

Question 8

How do I reduce my AI API costs?

Accepted Answer

Four approaches work. Use a smaller model — GPT-4.1 Nano costs 95% less than GPT-4o. Trim your system prompt. Use prompt caching for repeated context (50-80% cheaper). Batch non-urgent requests for a 50% discount on OpenAI and Anthropic.

Question 9

What is BPE tokenization?

Accepted Answer

Byte Pair Encoding (BPE) is how OpenAI splits text into tokens. It starts with individual characters, then merges the most common pairs into single tokens. Common words like "the" become one token. Rare technical terms get split into multiple tokens, which is why specialized text costs more to process.

Question 10

Which AI model has the largest context window?

Accepted Answer

Llama 4 Scout has a 10,000,000-token context window — the largest of any current model. Gemini 1.5 Pro supports 2,000,000 tokens. GPT-4.1 and Claude Sonnet 4.6 both support 1,000,000 tokens. GPT-4o is capped at 128,000 tokens.

Model	Context	Input / 1M	Output / 1M
GPT-4.1OpenAI	1M	$2	$8
GPT-4.1 MiniOpenAI	1M	$0.4	$1.6
GPT-4.1 NanoOpenAI	1M	$0.1	$0.4
GPT-4oOpenAI	128K	$2.5	$10
GPT-4o MiniOpenAI	128K	$0.15	$0.6
o3OpenAI	200K	$10	$40
o4-miniOpenAI	200K	$1.1	$4.4
Claude Opus 4.7Anthropic	200K	$15	$75
Claude Sonnet 4.6Anthropic	200K	$3	$15
Claude Haiku 4.5Anthropic	200K	$0.8	$4
Gemini 2.5 ProGoogle	1M	$1.25	$10
Gemini 2.5 FlashGoogle	1M	$0.3	$1.5
Gemini 2.0 FlashGoogle	1M	$0.1	$0.4
Llama 4 ScoutMeta	10M	Free	Free
Llama 4 MaverickMeta	1M	Free	Free

Free AI Token Counter

Model Comparison

How AI Tokenization Works