Prompt caching, AI glossary

Prompt caching is a feature that lets a language model reuse previously-processed content from a prior request at a heavily discounted rate. Anthropic, OpenAI and others all support some form of it.

On Anthropic specifically, cached prompt tokens are billed at roughly 10% of the standard input rate. For long multi-turn sessions with a stable system prompt + tools + early conversation content, this cuts effective input cost dramatically.

How it works (Anthropic)

When you make a request to the Anthropic API (including via Claude Code), you can mark portions of your prompt as cacheable. Anthropic caches those portions for the next 5 minutes (or longer if hit again). Subsequent requests that share the same cached prefix pay only 10% of the normal input rate for that content.

In Claude Code, caching is on by default, you don’t need to think about it. The harness automatically marks the system prompt, tool definitions, and early conversation as cacheable.

What this means in AUD

A typical Claude Code session has a large constant prefix: your CLAUDE.md (~3-5k tokens), tool definitions (~3-8k tokens), and conversation history that grows over time.

Without caching:

Turn 1: ~10k input tokens × $4.65/M (AUD on Sonnet 4.6) = $0.047
Turn 5: ~30k input tokens × $4.65/M = $0.140
Turn 20: ~80k input tokens × $4.65/M = $0.372

With caching:

Turn 1: same, since nothing’s cached yet
Turn 5: most of the input is a cache hit, ~$0.014 cost
Turn 20: ~$0.037

The savings compound. Over a real working day, prompt caching cuts the input portion of your bill by 50-80%.

How to maximise cache hits

Keep your CLAUDE.md stable. Don’t restructure mid-session.
Use /compact instead of /clear when you can. Compact preserves cacheable structure; clear nukes the cache.
Don’t constantly add/remove tools. Tool definitions are part of the cached prefix.
Long sessions beat short sessions for caching economics. The cache amortises.

What doesn’t cache

Your most recent message
The model’s most recent response
Anything before the most recent edit to your CLAUDE.md or tool list

The cached portion is whatever’s stable + identical to a recent prior request.

How it works (Anthropic)

What this means in AUD

How to maximise cache hits

What doesn’t cache

Related terms

Get the next one in your inbox

Want this built for your business?

Keep reading

Prompt

Context window

In-context learning