How much does prompt caching actually save?

On well-cached content, 90% off the input rate. A 100k-token system prompt + tools that would cost $1.50 USD per turn drops to ~$0.15 USD per turn from the second turn on. Real Australian sessions see 50-80% effective input-token cost reduction.

Does Sonnet really do 95% of tasks as well as Opus?

For coding, refactoring and most agentic work in 2026, yes. The 5% where Opus wins is open-ended reasoning, multi-file planning, and gnarly debugging where the model has to hypothesise widely. For Australian small business workflows specifically, Sonnet 4.6 is usually enough.

Is there a flat-rate plan?

Not for Claude Code via the Anthropic Console at time of writing. Pay-per-token only. Claude.ai has a Pro/Team plan but that's for the chat UI, not the CLI.

Claude Code cost optimisation: how to cut your AUD bill 60% without losing quality

In short

Five levers, in priority order: (1) enable prompt caching aggressively, (2) default to Sonnet 4.6, (3) keep your CLAUDE.md tight, (4) use /compact and /clear at task boundaries, (5) audit your top spend sessions monthly. Doing all five took our monthly bill from $280 AUD to $110 AUD with no quality drop.

We run Claude Code daily across DotVA and the Lead Gen Empire network. Before we got serious about cost, our combined bill was roughly $280 AUD/month. After applying the five tactics below, it sits at $110 AUD/month with the same or better output. Here’s what works.

1. Prompt caching: the 60% lever

Prompt caching lets Claude reuse the system prompt + early-conversation content from prior turns at 10% of the normal input rate. On long multi-turn sessions, this is the single biggest cost lever.

How to actually benefit:

Keep early-conversation content stable. Don’t restructure your CLAUDE.md mid-session. Don’t add and remove tools constantly.
Use the Claude Code default cache. It’s on by default for messages.create and messages.stream, you don’t need to opt in.
Avoid /clear mid-session if you can /compact instead. Compact preserves cacheable content; clear nukes it.

A real example: our typical Lead Gen Empire content session has ~80k input tokens of CLAUDE.md, tool context and example articles. First turn: ~$0.24 AUD input. Subsequent turns with cache hit: ~$0.025 AUD input. 10x cheaper from turn 2 onwards.

2. Right-size your model

Sonnet 4.6 vs Opus 4.7 in AUD (approx, FX 1 USD = 1.55 AUD):

Tier	Sonnet 4.6 input	Sonnet 4.6 output	Opus 4.7 input	Opus 4.7 output
Per 1M tokens (AUD)	$4.65	$23.25	$23.25	$116.25
With prompt caching	$0.47	$23.25	$2.33	$116.25

Opus is 5x more expensive on input, 5x more on output. For routine coding and Australian SMB workflows, that 5x doesn’t buy 5x more value.

Default to Sonnet. Switch with /model opus only when:

You’re planning a multi-file refactor
You’re debugging something subtle where you’ve already tried Sonnet
You’re doing open-ended research where reasoning quality matters

/model haiku for batch text work, simple edits, throwaway tasks.

3. Keep your CLAUDE.md under 4k tokens

Every turn, your CLAUDE.md re-enters the context (or comes off the cache, at 10%). A 30k-token CLAUDE.md is a 30k-token tax on every session.

Audit it. We had one CLAUDE.md at the Lead Gen Empire repo that had grown to 28k tokens over six months, every past decision, every recipe, every gotcha had ended up there. Trimmed back to 3.8k tokens, with the long detail moved into docs/ and referenced on demand. Session cost dropped ~30%.

Test: open your CLAUDE.md in any token counter. If it’s over 4k tokens, you’re paying for context you probably aren’t using every session.

4. Use /compact and /clear strategically

/compact when you’ve finished a task but want to keep the high-level context for the next task. Summarises everything to date into a much shorter version.
/clear when you’re switching to a genuinely unrelated task. Starts fresh, loses the cache.

Don’t be precious about context. The “I might need that earlier conversation” instinct is almost always wrong, if you do need it, you can re-derive in 10 seconds with a fresh prompt. The accumulated context tax is much higher than the cost of one re-read.

5. Audit your top 5 sessions monthly

In Anthropic Console → Usage, you can see cost per session. Once a month, look at the top 5 most expensive sessions. Patterns emerge fast:

“I let Claude run overnight without /compact” → put a /compact in your morning checklist
“I switched to Opus for a simple task and forgot to switch back” → a hook that warns when you’ve been on Opus >30 min
“Same CLAUDE.md tax every session for this project” → trim that file

Our Lead Gen Empire September audit revealed a single workflow (the daily SEO review) was costing $40 AUD/month because we were running it on Opus when Sonnet was fine. Switched. $32 AUD/month saved with zero quality drop.

Tactic 6 (bonus): batch your throwaway work

If you’ve got a bunch of small jobs (rename these 50 files, summarise these 30 PDFs), run them all in one Claude Code session rather than spinning up a new session for each. The cache amortises across all the work; cost-per-task drops dramatically.

What you should not do to save money

Don’t skip CLAUDE.md. Saves a few cents per session, costs you hours in repeated explanation. Bad trade.
Don’t use Haiku for real work. It’s tempting but the quality drop is real. Use Haiku for genuine batch-text work or as a research subagent, not for primary development.
Don’t disable tools you actually use. Slightly fewer tool definitions = slightly cheaper context. The savings are tiny and the friction is high.
Don’t constantly switch projects. Cache lives per session. Project-hopping kills the cache.

What a “tight” Claude Code budget looks like

Use case	Realistic monthly AUD
Solo dev, 1-2 hrs/day on Claude Code, Sonnet + cache	$40 - $80
Solo operator, non-dev, weekly business automation	$15 - $40
Active developer, 4-6 hrs/day, heavy MCP usage	$120 - $200
Team of 4 sharing an account (don’t do this)	Pain and confusion

Don’t share Anthropic Console accounts across team members; you’ll never untangle who’s spending what. Each team member gets their own seat.

What we did at Boring Ventures

Bill timeline across three businesses combined:

February 2026: $280 AUD/month
March: $190 (added prompt caching enforcement to our CLAUDE.md template, switched default model)
April: $135 (cleaned up the worst CLAUDE.md offenders, added /compact discipline)
May: $110 (audited the top-5 sessions, killed one Opus-only workflow that didn’t need it)

Same output, more sessions, ~60% cheaper. The discipline is more valuable than any one trick.

Claude Code cost optimisation: how to cut your AUD bill 60% without losing quality

1. Prompt caching: the 60% lever

2. Right-size your model

3. Keep your CLAUDE.md under 4k tokens

4. Use /compact and /clear strategically

5. Audit your top 5 sessions monthly

Tactic 6 (bonus): batch your throwaway work

What you should not do to save money

What a “tight” Claude Code budget looks like

What we did at Boring Ventures

Common questions

Want this built for your business?

1. Prompt caching: the 60% lever

2. Right-size your model

3. Keep your CLAUDE.md under 4k tokens

4. Use /compact and /clear strategically

5. Audit your top 5 sessions monthly

Tactic 6 (bonus): batch your throwaway work

What you should not do to save money

What a “tight” Claude Code budget looks like

What we did at Boring Ventures

Common questions

Get the next one in your inbox

Want this built for your business?

Keep reading

Claude Code vs Cursor vs GitHub Copilot: which one for an Australian developer in 2026?

Getting started with Claude Code in Australia: a no-fluff install guide