Token, AI glossary

A token is the unit of text a language model processes. It’s neither a word nor a character, it’s a chunk somewhere in between, defined by the model’s tokenizer.

For English text, a rough rule of thumb: 1 token ≈ 4 characters ≈ 0.75 words. So a 1,000-word document is roughly 1,300-1,400 tokens. A short email might be 200 tokens. A novel is 100,000+.

Numbers, code, and non-English text tokenise differently, sometimes more efficiently, sometimes less.

Why tokens matter

You’re billed per token. As of May 2026, Claude pricing in USD:

Model	Input ($/1M)	Output ($/1M)
Opus 4.7	$15	$75
Sonnet 4.6	$3	$15
Haiku 4.5	$1	$5

In AUD (1 USD ≈ 1.55 AUD), Sonnet costs roughly $4.65 / $23.25 per million input/output tokens.

A typical hour-long coding session with prompt caching might use:

80k input tokens × $3/M (USD) = $0.24 input cost
12k output tokens × $15/M (USD) = $0.18 output cost
Total: ~$0.42 USD or ~$0.65 AUD

Multiply across sessions to estimate monthly spend. A solo developer running 4-5 hours/day on Sonnet ends up at $40-100 AUD/month.

Output tokens cost 5x input

This is the single most important pricing fact. Asking for longer outputs (verbose explanations, full files rewritten when you only need a diff) is 5x more expensive than the input tokens you sent. Concise output is good economics as well as good UX.

Counting tokens

Anthropic’s tokenizer: available via the API (count_tokens endpoint) or in their Python/TS SDKs.
Rule of thumb: divide character count by 4.
For code: similar to English, slightly fewer characters per token because of operators and indentation.

Why tokens matter

Output tokens cost 5x input

Counting tokens

Related terms

Get the next one in your inbox

Want this built for your business?

Keep reading

Context window

Function calling

Large Language Model