UPDATED FOR THE CLAUDE 5 FAMILY · JULY 2026

You're overpaying
for Claude.

Most production workloads overpay 40–80%: prompt caching, the Batch API, model right-sizing and effort tuning sit unused. Audit your bill in 60 seconds, pick the right model, and see exactly what everything costs. Free, no signup.

01 · MODEL PICKER

Three questions.

1 · What are you building?
2 · What matters most?
3 · Expected volume?
Answer the three questions to get a recommendation — with the reasoning, the exact model ID, and a runner-up.
02 · COST CALCULATOR

What will it cost?

Live math across every current Claude model. Prices are per million tokens, from Anthropic's published list pricing.

Prompt cache hit rate0%
Cached input reads bill at ~10% of the input price. Assumes a warmed cache; the one-time 1.25× write premium isn't included.
Model $ / request $ / month vs cheapest
03 · THE LINEUP

Every current Claude model.

Click any model ID to copy it. Data cached 2026-06-24; verify against GET /v1/models for live values.

Model Model ID (click to copy) Context Max out Input $/MTok Output $/MTok Best for
Sonnet 5 intro pricing ($2 / $10 per MTok) applies through 2026-08-31, then $3 / $15. Batch API is 50% off all token usage on every model. Cache reads ≈ 10% of input price.
04 · INTEGRATION CHEATSHEET

Ship it without the 400s.


                
⚠ These return a 400 on the Claude 5 era models
  • budget_tokens — removed on Fable 5 / Opus 4.8 / 4.7 / Sonnet 5. Use thinking: {type: "adaptive"} + output_config.effort.
  • temperature / top_p / top_k — removed on the same models. Steer with prompting.
  • assistant prefill — 400 on all 4.6+ models. Use structured outputs (output_config.format).
  • thinking: {type: "disabled"} — 400 on Fable 5 only. Omit the parameter; thinking is always on.
Fable 5 specifics
  • Check stop_reason == "refusal" before reading content — safety classifiers can decline with HTTP 200.
  • Opt into server-side fallbacks by default: beta server-side-fallback-2026-06-01 + fallbacks: [{"model": "claude-opus-4-8"}].
  • Requires 30-day data retention — ZDR orgs get 400 on every request.
  • Same tokenizer as Opus 4.8; plan for minutes-long turns at high effort — stream everything.
Cost levers, cheapest first
  • Batch API — 50% off everything, async within 24h.
  • Prompt caching — reads at ~10% of input price. Keep stable content first; a timestamp in your system prompt silently kills it.
  • Right-size the model — Haiku 4.5 is 10× cheaper than Opus 4.8 on input.
  • effort: "medium" — often the cost/quality sweet spot on 4.6+ models.
05 · FOR YOUR TOOLING

models.json

The whole registry — IDs, pricing, context windows, discounts — as machine-readable JSON. Point your scripts, dashboards, and agents at it.

curl -s https://5.maib.io/models.json | jq '.models[] | {id, input_per_mtok, output_per_mtok}'

New model drops. This page updates.

Leave an email and get a one-line note when the lineup or pricing changes. Nothing else, ever.

5.maib.io — mAIb Technologies · Pricing data cached 2026-06-24, verify against the Anthropic Models API