The Claude Cost Playbook — cut your Anthropic bill 40–80%

WHAT'S INSIDE

Nine chapters. Zero filler.

01

The cost anatomy of a Claude request

Where every dollar goes: input vs output vs thinking tokens, and why the 45/55 split matters.

02

Prompt caching that actually hits

Prefix mechanics, breakpoint placement, TTL economics, and the six silent invalidators (free preview below).

03

Batch API migration

Identifying batchable traffic, the polling pattern, combining batch + caching for compounding discounts.

04

Model right-sizing with eval gates

The routing architecture, building the eval set that proves Haiku/Sonnet is safe, the escalation pattern.

05

Effort tuning on the 4.6+ family

What each level actually changes, the sweep protocol, per-route defaults that hold up.

06

Context discipline

Compaction vs context editing, killing runaway histories, the 20-block lookback trap.

07

Fable 5 economics

When the $10/$50 frontier model is cheaper than retries on Opus — and when it never is.

08

Monitoring & regression alarms

The four usage fields to alert on, catching cache-hit-rate regressions before the invoice does.

09

The 30-day rollout plan

Sequenced by ROI-per-engineering-hour, with acceptance criteria per week.

FREE PREVIEW · FROM CHAPTER 02

The six silent cache invalidators

Prompt caching is a prefix match on exact bytes. One changed byte at position N invalidates every cache breakpoint at or after N. When teams tell me "we turned on caching and saved nothing," the cause is always one of six patterns sitting in the prompt-assembly path:

Timestamps in the system prompt. f"Current date: {datetime.now()}" makes every request a unique prefix. Move time context into the last user message — after the cache breakpoint.
Per-user or per-request IDs early in the prompt. Session IDs interpolated into the system prompt mean zero cross-user cache sharing. Put identity late, or pass it as message content.
Non-deterministic serialization. json.dumps(d) without sort_keys=True, or iterating a set, shuffles bytes between identical requests.
Conditional system sections. Every feature-flag combination is a distinct prefix. Freeze the system prompt; signal modes through messages.
Varying tool sets. Tools render at position zero — before the system prompt. Adding, removing or reordering one tool invalidates everything. Sort tools by name; never swap the set mid-conversation.
Prefixes below the cacheable minimum. Under ~1–4K tokens (model-dependent) the marker is silently ignored — no error, just cache_creation_input_tokens: 0.

The verification loop: check usage.cache_read_input_tokens on every response in staging. If it's zero on the second identical-prefix request, diff the rendered prompt bytes between the two requests — the invalidator will be staring at you. The full chapter covers breakpoint placement patterns, 5-minute vs 1-hour TTL break-even math, and cache pre-warming with max_tokens: 0.

FREE PREVIEW · FROM CHAPTER 03

Finding your batchable 30%

The Batch API is the least glamorous and most reliable lever: a flat 50% discount on every token, for accepting asynchronous delivery (typically under an hour, 24h worst case). The audit question is not "what is a batch job" — it's "which of my requests does a human actually wait for?" Everything else is batchable: nightly enrichment, embeddings-adjacent classification, eval runs, report generation, digest emails, backfills, re-scoring. Teams routinely discover 20–40% of traffic qualifies. Combined with caching, discounts compound: batched cache reads bill at 50% of the ~10% read price.

That's 2 of 9 chapters. The rest is implementation depth: code, eval gates, rollout sequencing.

Get the full Playbook — $29

Get the Playbook

$29 launch price · lifetime updates as the lineup changes

We're onboarding payments — reserve your copy at the $29 launch price and you'll get the download link the moment it's live (within days). No charge until you confirm.

30-day no-questions refund. If the free audit says you can't recover at least $2,900/year, keep your money.

5.maib.io — mAIb Technologies

Hub Free audit

Your Claude bill,cut 40–80%.Engineered, not guessed.

Nine chapters. Zero filler.

The six silent cache invalidators

Finding your batchable 30%

Get the Playbook

Your Claude bill,
cut 40–80%.
Engineered, not guessed.