Glossary
Latency
The time between sending a request to an AI model and getting a response. Matters most for user-facing features; not so much for background batch work.
Latency is the time it takes from sending a prompt to receiving the response. For AI APIs, latency is typically measured as:
- Time to first token (TTFT): how long until the response starts streaming back
- Total response time: how long until the full response is complete
For Claude in 2026, from a Sydney-based client:
| Model | TTFT | Per-token output |
|---|---|---|
| Haiku 4.5 | ~250ms | ~5-15ms |
| Sonnet 4.6 | ~400ms | ~10-25ms |
| Opus 4.7 | ~800ms | ~20-40ms |
So a typical Sonnet response of 500 output tokens lands in 5-13 seconds total.
When latency matters
- Voice interfaces (sub-second TTFT is the bar)
- Customer-facing chat (1-2 seconds feels acceptable; 5+ feels broken)
- Inline coding completion (every 100ms of latency reduces acceptance rate)
- Search-as-you-type (needs Haiku-tier or local models)
When latency doesn’t matter
- Background agents that run overnight (60 seconds vs 6 seconds is irrelevant if you’re asleep)
- Document drafts where you’ll edit before sending
- Batch processing where you’re handing the model 100 tasks
- One-off research queries
How to reduce latency
- Pick a faster model. Haiku < Sonnet < Opus by 2-3x.
- Reduce input tokens. Smaller prompts process faster.
- Reduce output tokens. Shorter responses finish sooner.
- Use streaming. Even though total time is the same, TTFT is when “something happens”, better UX.
- Pre-warm with prompt caching. Subsequent calls in a session start faster.
- Geographic proximity. If you’re serving global users, deploy via Bedrock or Vertex in the right region.
What latency doesn’t fix
Quality. If you’re getting wrong answers fast, fixing latency makes them wrong faster. Get the right answer first; optimise for speed second.
Related terms
Want this built for your business?
Book a free 30-minute AI audit. We'll map your business and show you exactly which systems we'd build first. No pitch deck, no scoping fee.
Book my free AI audit