Technical
Combining Python and AI for Automated Document Generation
Clients send me raw data. I return polished PDFs. For years that middle step was a manual slog: open a template, paste values, format tables, export. Now it is a Python script that takes two minutes to run and produces brand-perfect output. The trick is combining Python templating with AI for the narrative sections.
The Two Kinds of Content
Every document has two kinds of content:
- Structured data: tables, metrics, dates, names. Needs precise formatting.
- Narrative: executive summary, recommendations, commentary. Needs judgment.
Python handles the structured data perfectly. AI handles the narrative. Combine them in the same pipeline and the whole document writes itself.
The Stack
I use python-docx for Word documents and python-pptx for PowerPoint. Both libraries let you open a template, replace placeholders, and inject rows into tables.
from docx import Document
doc = Document('template.docx')
for para in doc.paragraphs:
if '{{client_name}}' in para.text:
para.text = para.text.replace('{{client_name}}', client.name)
doc.save('output.docx')For narrative sections, I call Claude's API with the structured data as context:
prompt = f"Write a 3-sentence executive summary for {client.name}. "\
f"Their revenue grew {growth}% to ${revenue}. "\
f"Focus on the trajectory, not the absolute number."Why This Works
AI is unreliable at structured output. Ask it for a markdown table of financials and it invents numbers. Ask it for three sentences summarizing numbers you give it, and it delivers every time.
Python is unreliable at natural language. It can concatenate template strings, but the result reads like a robot wrote it. AI handles the human-feeling parts while Python handles the machine-precision parts.
The Full Pipeline
- Load structured data from the client (CSV, API, database)
- Validate with Pydantic
- Generate narrative sections via Claude API
- Inject both into the
.docxtemplate - Save and send
Each step is a few lines. The whole script is under 200 lines. It replaces about 45 minutes of manual work per document.
When to Not Use AI
Legal disclaimers, regulatory language, pricing. Anything where a wrong word creates liability. Python templating only, no AI in that content path.
Keeping Templates Editable
My templates live in Word, not in code. A non-developer can open the template, adjust the branding, and drop in new placeholders with the {{ }} syntax. The Python script does not care about fonts or margins. It only cares about placeholder names. That separation lets the designer and the engineer move independently.
Versioning Outputs
Every generated document gets written with a timestamp in the filename. Old outputs stay on disk for a week in case the client wants to reference a prior version. A cron job deletes anything older than that. Version control for generated artifacts is a nice safety net when a client reports 'the numbers looked different yesterday.'
See the python-docx documentation for the template manipulation patterns that make this pipeline possible.
RELATED READING
The Consulting Shift I Am Making In Year Two
After a year of writing and building, my consulting practice is changing shape. Shorter engagements. Sharper outcomes.
ReadThe Frontend Shift: Shipping Less JavaScript In Year Two
A year ago I reached for Next.js for everything. This year I often reach for nothing.
ReadThe Serverless Lesson I Would Write On A Sticky Note
After a year of shipping serverless projects, one rule explains most of the wins and all of the losses.
Read