LLM Cost Calculator

Real-world cost math for any LLM workload. Handles prompt caching, batch discounts, multi-turn context growth, and tool-call overhead. Paste a transcript for exact tokens, or compare every model side by side.

Advanced

Configure your workload on the left. Cost updates live.

How the math works

  • Input/output: tokens × per-million rate × number of requests.
  • Prompt caching: cached tokens are billed at the cache-read rate (often 10× cheaper). For Anthropic, the first write costs 1.25× input rate, then reads are 0.1×.
  • Batch API: 50% discount on supported providers (OpenAI, Anthropic, Gemini). 24h SLA.
  • Multi-turn growth: conversation context resends prior turns each time. Total input ≈ Σ(turn_n input + accumulated context).
  • Tool overhead: each function schema added to system prompt costs ~80 tokens × N tools × every request.

Tokenization caveats

OpenAI models (GPT-4o, o1, GPT-5) use the official BPE tokenizer for exact counts. Claude / Gemini / Llama use calibrated approximations within ~5% of real API counts. For exact Claude counts, use Anthropic's /v1/messages/count_tokens endpoint.