Prompt Caching in Production: Cutting Costs Without Cutting Quality

Token costs add up fast when you run AI in production. My monthly bill tripled between July and September as I added more agent workflows. Prompt caching is the single biggest lever I found for bringing that bill back down without losing any quality.

What Prompt Caching Does

Most AI workflows send the same system prompt, the same style guide, and the same project context on every request. That is a lot of tokens getting retransmitted for no reason. Prompt caching tells the model provider to hold those tokens in a short-lived cache so the next request can reuse them at a steep discount.

On Anthropic's API, cached input tokens cost about 10% of the normal rate. Over a workflow that calls the API fifty times with a shared preamble, that is a massive saving. In my case, prompt caching cut my daily spend by roughly 60% for the workflows that use it.

What To Cache

Cache the things that change rarely:

System prompts describing the agent's role
Style guides and voice specifications
Project context like repository structure or domain glossaries
Large reference documents the agent consults across many calls

Do not cache:

Per-user data that changes every request
Conversation history that grows with each turn
Anything smaller than the minimum cacheable block size

Code Structure

python

messages = [
    {
        "role": "system",
        "content": SYSTEM_PROMPT,
        "cache_control": {"type": "ephemeral"},
    },
    {
        "role": "user",
        "content": current_user_message,
    },
]

That single cache_control marker flips the cost curve for that request. The first call primes the cache, every subsequent call within the cache window pays the discounted rate.

The Tradeoff

Cache entries are short-lived. If your workflow has long gaps between calls, the cache expires and you pay the full rate to re-prime it. For bursty workloads it is free money. For sparse workloads it is a wash.

Read the Anthropic prompt caching docs before you enable it.

Prompt Caching in Production: Cutting Costs Without Cutting Quality

What Prompt Caching Does

What To Cache

Code Structure

The Tradeoff

The Consulting Shift I Am Making In Year Two

The Frontend Shift: Shipping Less JavaScript In Year Two

The Serverless Lesson I Would Write On A Sticky Note