LAUNCH PRICE $29 · LIFETIME UPDATES

Your Claude bill,
cut 40–80%.
Engineered, not guessed.

Anthropic gives you four enormous cost levers — cache reads at ~10% of input price, a flat −50% Batch API, a 10× price spread across the model lineup, and effort control. Almost nobody pulls them correctly. This playbook is the implementation manual: the exact mechanics, the failure modes, the eval gates, and the code.

Run the free 60-second audit first — if your recoverable spend isn't at least 100× the price of this book, don't buy it.
WHAT'S INSIDE

Nine chapters. Zero filler.

01
The cost anatomy of a Claude request
Where every dollar goes: input vs output vs thinking tokens, and why the 45/55 split matters.
02
Prompt caching that actually hits
Prefix mechanics, breakpoint placement, TTL economics, and the six silent invalidators (free preview below).
03
Batch API migration
Identifying batchable traffic, the polling pattern, combining batch + caching for compounding discounts.
04
Model right-sizing with eval gates
The routing architecture, building the eval set that proves Haiku/Sonnet is safe, the escalation pattern.
05
Effort tuning on the 4.6+ family
What each level actually changes, the sweep protocol, per-route defaults that hold up.
06
Context discipline
Compaction vs context editing, killing runaway histories, the 20-block lookback trap.
07
Fable 5 economics
When the $10/$50 frontier model is cheaper than retries on Opus — and when it never is.
08
Monitoring & regression alarms
The four usage fields to alert on, catching cache-hit-rate regressions before the invoice does.
09
The 30-day rollout plan
Sequenced by ROI-per-engineering-hour, with acceptance criteria per week.
FREE PREVIEW · FROM CHAPTER 02

The six silent cache invalidators

Prompt caching is a prefix match on exact bytes. One changed byte at position N invalidates every cache breakpoint at or after N. When teams tell me "we turned on caching and saved nothing," the cause is always one of six patterns sitting in the prompt-assembly path:

  1. Timestamps in the system prompt. f"Current date: {datetime.now()}" makes every request a unique prefix. Move time context into the last user message — after the cache breakpoint.
  2. Per-user or per-request IDs early in the prompt. Session IDs interpolated into the system prompt mean zero cross-user cache sharing. Put identity late, or pass it as message content.
  3. Non-deterministic serialization. json.dumps(d) without sort_keys=True, or iterating a set, shuffles bytes between identical requests.
  4. Conditional system sections. Every feature-flag combination is a distinct prefix. Freeze the system prompt; signal modes through messages.
  5. Varying tool sets. Tools render at position zero — before the system prompt. Adding, removing or reordering one tool invalidates everything. Sort tools by name; never swap the set mid-conversation.
  6. Prefixes below the cacheable minimum. Under ~1–4K tokens (model-dependent) the marker is silently ignored — no error, just cache_creation_input_tokens: 0.

The verification loop: check usage.cache_read_input_tokens on every response in staging. If it's zero on the second identical-prefix request, diff the rendered prompt bytes between the two requests — the invalidator will be staring at you. The full chapter covers breakpoint placement patterns, 5-minute vs 1-hour TTL break-even math, and cache pre-warming with max_tokens: 0.

FREE PREVIEW · FROM CHAPTER 03

Finding your batchable 30%

The Batch API is the least glamorous and most reliable lever: a flat 50% discount on every token, for accepting asynchronous delivery (typically under an hour, 24h worst case). The audit question is not "what is a batch job" — it's "which of my requests does a human actually wait for?" Everything else is batchable: nightly enrichment, embeddings-adjacent classification, eval runs, report generation, digest emails, backfills, re-scoring. Teams routinely discover 20–40% of traffic qualifies. Combined with caching, discounts compound: batched cache reads bill at 50% of the ~10% read price.

That's 2 of 9 chapters. The rest is implementation depth: code, eval gates, rollout sequencing.
Get the full Playbook — $29

Get the Playbook

$29 launch price · lifetime updates as the lineup changes
We're onboarding payments — reserve your copy at the $29 launch price and you'll get the download link the moment it's live (within days). No charge until you confirm.
30-day no-questions refund. If the free audit says you can't recover at least $2,900/year, keep your money.
5.maib.io — mAIb Technologies