LLM Cost Calculator

Real-world cost math for any LLM workload. Handles prompt caching, batch discounts, multi-turn context growth, and tool-call overhead. Paste a transcript for exact tokens, or compare every model side by side.

Model

Input tokens per request

Output tokens per request

Number of requests

Advanced

Cached input tokens (per request)

Tool calls per request

Batch API (50% discount)

Multi-turn conversation

Configure your workload on the left. Cost updates live.

How the math works

Input/output: tokens × per-million rate × number of requests.
Prompt caching: cached tokens are billed at the cache-read rate (often 10× cheaper). For Anthropic, the first write costs 1.25× input rate, then reads are 0.1×.
Batch API: 50% discount on supported providers (OpenAI, Anthropic, Gemini). 24h SLA.
Multi-turn growth: conversation context resends prior turns each time. Total input ≈ Σ(turn_n input + accumulated context).
Tool overhead: each function schema added to system prompt costs ~80 tokens × N tools × every request.

Tokenization caveats

OpenAI models (GPT-4o, o1, GPT-5) use the official BPE tokenizer for exact counts. Claude / Gemini / Llama use calibrated approximations within ~5% of real API counts. For exact Claude counts, use Anthropic's /v1/messages/count_tokens endpoint.