LLM Cost Calculator
Real-world cost math for any LLM workload. Handles prompt caching, batch discounts, multi-turn context growth, and tool-call overhead. Paste a transcript for exact tokens, or compare every model side by side.
Advanced
Configure your workload on the left. Cost updates live.
How the math works
- Input/output: tokens × per-million rate × number of requests.
- Prompt caching: cached tokens are billed at the cache-read rate (often 10× cheaper). For Anthropic, the first write costs 1.25× input rate, then reads are 0.1×.
- Batch API: 50% discount on supported providers (OpenAI, Anthropic, Gemini). 24h SLA.
- Multi-turn growth: conversation context resends prior turns each time. Total input ≈ Σ(turn_n input + accumulated context).
- Tool overhead: each function schema added to system prompt costs ~80 tokens × N tools × every request.
Tokenization caveats
OpenAI models (GPT-4o, o1, GPT-5) use the official BPE tokenizer for exact counts. Claude / Gemini / Llama use calibrated approximations within ~5% of real API counts. For exact Claude counts, use Anthropic's /v1/messages/count_tokens endpoint.