Technical
The Prompt Versioning System I Wish I Had in 2025
Prompts in production are code. Treat them that way and your life is easier. I spent most of 2025 editing prompts inline and losing track of what worked. Over the holidays I finally built a proper versioning system for prompts. Here is what it looks like and why you should copy it rather than inventing your own.
The Shape
Every prompt is a file. Every file has a version header. Every version ships behind a feature flag. Old versions stay queryable for at least 90 days so you can compare outputs before and after changes.
# prompts/classify_doc/v3.yaml
version: 3
model: claude-haiku-4-5
system: |
You are a document classifier...
user_template: |
Classify this document: {{ text }}Why Files Beat Strings in Code
- Diffable in git with meaningful blame
- Reviewable in PRs by stakeholders who cannot read the application code
- Runnable without redeploying the application
- Editable by non-engineers when needed
- Easy to snapshot for compliance or audit
Inline prompts get rewritten by the same developer who wrote them originally. File prompts get a real change history where you can see why each edit was made.
The Runtime
A thin loader reads the yaml, renders the template, and ships it to the API. Maybe thirty lines of Python total. No new dependencies. The loader also caches the compiled prompt per version so there is no performance cost compared to inline strings.
The Eval Hook
Every new version runs against a frozen eval set before it ships. If accuracy drops more than two points, the PR is blocked. This is the gate that catches silent regressions. Without it, a well-intentioned prompt edit can silently degrade production for weeks.
The Rollback Path
Because versions are files with numbers, rollback is a one-line configuration change. No git revert. No emergency deploy. Flip the flag to the previous version and the bug goes away while you debug.
Why I Did Not Build This Sooner
Because the pain was slow. Every bad prompt wasted ten minutes. Over a year that added up to more than the build cost. Sunk cost bias kept me in inline edits too long. The general lesson is that slow pain is harder to notice than sharp pain but often costs more in aggregate.
RELATED READING
The Consulting Shift I Am Making In Year Two
After a year of writing and building, my consulting practice is changing shape. Shorter engagements. Sharper outcomes.
ReadThe Frontend Shift: Shipping Less JavaScript In Year Two
A year ago I reached for Next.js for everything. This year I often reach for nothing.
ReadThe Serverless Lesson I Would Write On A Sticky Note
After a year of shipping serverless projects, one rule explains most of the wins and all of the losses.
Read