What is a context window, and why does it matter for your business?
Context windows in plain English. What they are, how they limit AI, why 1 million tokens matters, and the practical consequences for Australian small business use.
A context window is how much information an AI can hold in working memory at once. In May 2026, Claude Sonnet 4.6 holds 1 million tokens (~750,000 words). GPT-5 holds 256k tokens (~190,000 words). Gemini 2.5 Pro holds 1 million tokens. Bigger context lets AI read whole books, audit whole codebases, and remember more of a long conversation. For most business chat, you don’t need the biggest; for analysing long documents, you do.
The 60-second mental model
Imagine you’re a consultant who can read very fast but has zero permanent memory. Every meeting, you start from scratch. Whatever someone tells you or hands you, you can hold in your head for the meeting, but once the meeting ends, it’s gone.
That’s an AI model. The “head holding things” capacity is the context window.
- A small context (4k tokens) = you can read a 3-page brief
- A medium context (32k tokens) = you can read a 25-page document
- A large context (200k tokens) = you can read a 150-page report
- A very large context (1M tokens) = you can read a 750-page book
The bigger the window, the more you can hand the model at once. But every query that uses the context costs money (per-token billing) and runs slower.
What “token” actually means
A token is a chunk of text the model processes. Roughly:
- 1 token ≈ 0.75 words in English
- 1 page of text ≈ 500 tokens
- 1 short email ≈ 100-200 tokens
- 1 PDF book (300 pages) ≈ 150,000 tokens
The exact ratio varies by language and content type. Code uses more tokens per character. Common English words use fewer.
For most practical purposes: “tokens are how AI is billed and how much it can hold”. You don’t need to count them precisely.
Why bigger isn’t always better
Three practical limits:
1. Cost scales linearly. A 1M-token query costs roughly 8x a 128k-token query (and a hundred times a 10k-token query). For everyday chat, this is overkill spending.
2. Speed scales inversely. Bigger context = slower response. A 10k-token query returns in 5-10 seconds. A 500k-token query might take 30-90 seconds.
3. Accuracy degrades in the middle. Models suffer from “lost in the middle” syndrome: they pay close attention to the start and end of context, but mid-context information sometimes gets glossed over. Stuffing 500k tokens because you can doesn’t mean the model uses them effectively.
The right move: match context to task. Use small context for short queries; large context for genuinely long documents.
What 1 million tokens actually unlocks
The 1M-token context (Claude Sonnet 4.6 + Gemini 2.5 Pro) makes new things possible:
Whole-book analysis. Upload a 300-page legal contract. Ask “what are the indemnification clauses and how do they compare to industry-standard?” The model reads it all.
Whole-codebase audits. Paste an entire 50,000-line codebase. Ask “find security vulnerabilities”. Model reviews the lot.
Multi-month conversation logs. Past clients’ Slack history for context on their preferences. Past emails for tone training. Past customer support tickets for pattern recognition.
Bulk SKU analysis. Paste 5,000 product descriptions. Ask “find duplicates, identify SKUs without images, flag inconsistent pricing”.
All of these were impossible at 32k tokens. They’re routine at 1M.
What you’d care about for normal business use
Most Australian SMB AI use sits well below the context limits:
| Task | Typical tokens | Limit you’ll hit |
|---|---|---|
| Drafting an email | 500-2,000 | Never |
| Long-form blog post | 2,000-10,000 | Never |
| Reviewing a contract (10 pages) | 5,000-15,000 | Never |
| Analysing a quarterly report (50 pages) | 30,000-50,000 | 32k limit if on older models |
| Whole-website audit (50 pages) | 50,000-100,000 | 128k limit on GPT-5 |
| Full codebase review | 100k-500k+ | Anything below 200k |
| Reading a full novel | 100k-300k | Most 2025-era models |
Practical guidance: pay attention to context limits when you’re doing a specific, document-heavy task. Ignore them for chat-based work.
How to use context efficiently
Three patterns that save money and improve accuracy:
1. Start fresh for different tasks. Don’t try to have one long-running chat that covers everything. Start a new conversation for each distinct task. Smaller context, faster response, better accuracy.
2. Summarise long inputs first. Instead of pasting 100 pages and asking a question, first ask the AI to summarise the document into 5 bullet points. Then start a new conversation with just the summary + your question.
3. Put your question at the end. Models pay closest attention to the most-recent content. If you paste context and then ask a question, put the question at the very bottom.
The provider differences
Mid-2026 state:
| Provider | Model | Context | Pricing (rough AUD per M tokens in) |
|---|---|---|---|
| Anthropic | Claude Sonnet 4.6 | 1M | ~$4.50 |
| Anthropic | Claude Opus 4.7 | 1M | ~$22 |
| Anthropic | Claude Haiku 4.5 | 1M | ~$1.20 |
| OpenAI | GPT-5 | 256k | ~$30 |
| OpenAI | GPT-5 mini | 256k | ~$1.50 |
| Gemini 2.5 Pro | 1M | ~$5 | |
| Gemini 2.5 Flash | 1M | ~$0.75 |
If long context matters and budget doesn’t, Claude Sonnet 4.6 is the sweet spot. If cheap-and-fast matters more, Gemini 2.5 Flash. If you’re on ChatGPT and don’t need the absolute biggest, GPT-5 at 256k is plenty for 95% of business tasks.
The “prompt caching” bonus
Anthropic introduced prompt caching in 2024 (now widely supported across providers). The idea: if you reuse the same context across many queries (e.g. a 200-page knowledge base that every query references), the second query onwards is 90% cheaper because the model caches the context.
This is how we run agents that touch large contexts repeatedly without breaking the budget. See Prompt caching for the deeper explanation.
What’s next
- Context window for the glossary-style technical definition.
- Token for what a token actually is.
- Prompt caching for the cost-optimisation pattern.
- Claude vs ChatGPT for Australian small business for picking the right model based on context needs.
Common questions
What's a token?
Should I always use the model with the biggest context?
How does context relate to memory?
What happens when I exceed the context window?
Does context window affect accuracy?
Want this built for your business?
Book a free 30-minute AI audit. We'll map your business and show you exactly which systems we'd build first. No pitch deck, no scoping fee.
Book my free AI audit