Technical
Structured Logging Patterns I Adopted This Year
Unstructured logs are how you lose hours in production. You know the symptom: something broke, you grep through CloudWatch, you find nothing useful, you add logs, you redeploy, you wait. Structured logging fixes this pattern. Here are the patterns I standardized on this year.
The Core Principle
Every log line is a JSON object with a fixed set of top-level fields and a flexible payload. That structure lets CloudWatch Logs Insights, Datadog, or any log platform actually filter and aggregate.
import json
import logging
import time
def log_event(event: str, **kwargs):
record = {
'timestamp': time.time(),
'event': event,
'service': 'posts-api',
**kwargs,
}
print(json.dumps(record))The Required Fields
Every log line I write has at minimum:
timestamp: epoch secondsevent: short identifier likepost.createdoremail.send.failedservice: which service produced the loglevel: info, warn, errorrequest_id: ties logs from one request together
With just those five fields I can answer most production questions.
The Request-ID Pattern
The single highest-leverage change was propagating a request ID through every log call. A user reports a bug, they send me the request ID from the response header, I filter the logs by that ID, and I see every step of their request. Thirty seconds instead of thirty minutes.
What Not to Log
- Passwords, tokens, API keys (obviously)
- Full request bodies unless sanitized
- PII beyond what is necessary
- Log-per-loop-iteration patterns (hello, bill explosion)
The Shape Rule
Logs and metrics are different. Logs are strings and events. Metrics are numbers over time. Do not try to use logs as metrics. Emit metrics separately using CloudWatch Embedded Metric Format or your platform equivalent.
Structured logging is not a silver bullet. It is a small discipline that compounds over the life of a project. Add it on day one, not on day one hundred.
The Log Levels Problem
Most apps have too many info logs and too few warn logs. The fix is a discipline: info is for normal flow, warn is for recoverable issues, error is for things a human needs to see. If every log is info, nothing is actionable. Spend time tuning levels until warn and error are rare enough to be meaningful.
What I Added Late
- Metric logs for latency histograms (should have had earlier)
- Correlation IDs across service boundaries (retrofitted painfully)
- Sampling for high-volume log paths (keeps CloudWatch bills reasonable)
Each of these produced an obvious improvement the week it shipped. Each should have been in the initial design, not added in response to an outage.
See the AWS structured logging guide for Lambda-specific patterns.
RELATED READING
The Consulting Shift I Am Making In Year Two
After a year of writing and building, my consulting practice is changing shape. Shorter engagements. Sharper outcomes.
ReadThe Frontend Shift: Shipping Less JavaScript In Year Two
A year ago I reached for Next.js for everything. This year I often reach for nothing.
ReadThe Serverless Lesson I Would Write On A Sticky Note
After a year of shipping serverless projects, one rule explains most of the wins and all of the losses.
Read