Prompt caching is supposed to save money — instead the bill went up 40%

Team enabled Anthropic prompt caching expecting to save 90% on system-prompt tokens. Instead the bill went up. Each request was modifying the system prompt by injecting the current timestamp, which invalidated the cache on every single call. They paid the 1.25× cache-write multiplier with zero cache hits.

What happened

Customer support agent. System prompt is ~12K tokens of product documentation, brand guidelines, and conversation policies. Anthropic prompt caching enabled with a 5-minute TTL.

Expected: first call writes the cache (cost = 12K × $3.75/M = $0.045), every subsequent call within 5 minutes reads it (cost = 12K × $0.30/M = $0.0036). Savings: ~92%.

Actual:

Day 1 cost:  $124  (caching disabled)
Day 2 cost:  $173  (caching enabled)

40% increase.

The cause was buried in the system-prompt builder:

def build_system_prompt():
    return f"""
You are SupportBot. Current time: {datetime.now().isoformat()}.
[12K tokens of documentation]
"""

The current time changed on every call. The cache key changes on every call. Every call was a cache miss → cache write at the 1.25× multiplier. Zero reads.

Diagnosis

Cache-write rates are higher than non-cached input rates on most providers:

Anthropic: cache-write = 1.25× input rate, cache-read = 0.1× input rate
OpenAI: cache-write = same as input, cache-read = 0.5× input

For caching to save money, you need a cache hit rate > some threshold. For Anthropic the math is roughly: breakeven_hit_rate = (1.25 - 1) / (1 - 0.1) = 28%. Below 28% hit rate, caching costs more than no caching.

This deployment had 0% hit rate.

The fix

Anything that varies per-call must come after the cached portion:

def build_messages(user_msg):
    return [
        {
            "role": "system",
            "content": [
                {
                    "type": "text",
                    "text": SYSTEM_PROMPT_STATIC,  # 12K tokens, never changes
                    "cache_control": {"type": "ephemeral"},
                },
            ],
        },
        {
            "role": "user",
            "content": f"Current time: {datetime.now().isoformat()}\n\n{user_msg}",
        },
    ]

Now the system prompt is identical across calls → cache hits → 92% savings on the static portion.

After fix:

Day 3 cost:  $14   (proper cache structure)

Takeaway

Audit what's inside your cached prompt. Anything dynamic (timestamps, request IDs, user IDs, locale) must be outside the cached block. And monitor your cache hit rate — if it's below 30% (Anthropic) or 50% (OpenAI), caching is costing you money. The break-even depends on the provider's read/write multipliers.