Technical
Testing Strategies That Hold Up in Production
Test advice online is either dogmatic or useless. 'Always do TDD'. 'One hundred percent coverage'. 'Unit tests only'. After eight months of shipping a serverless platform with real users, I have a testing strategy that is pragmatic and has caught real bugs before production.
The Three-Tier Pyramid I Actually Use
Tier one: pure function tests. Fast, boring, many. Every business rule, validation helper, or transformation function has a test. Milliseconds per test. Thousands of them. No mocks, no fixtures, just input/output.
Tier two: handler tests with fake boundaries. Each Lambda or endpoint has a test that runs the full handler with a fake database and fake external services. Fakes are in-memory Python classes, not mocks. They behave like the real thing well enough for the logic under test.
Tier three: one smoke test per critical path. A single end-to-end test hits the real deployed stack against a test environment. Only the top five user journeys. This is where network, permissions, and config bugs show up. The ones unit tests cannot catch.
The Coverage Rule
I do not chase a coverage number. I require that every bug fix ships with a test that would have caught it. That means coverage grows organically where real bugs happen, instead of uniformly across code that has never failed.
bug report -> reproduce with a test -> fix -> confirm test passes -> shipThat loop is the single most valuable testing discipline I have. Every fix becomes a permanent regression guard for exactly the thing that actually went wrong.
What I Stopped Doing
Writing tests before the design is stable. Mocking every database call. Integration tests that spin up Docker containers for every run. All three were costing me more than they saved. See Martin Fowler on test doubles for the classic taxonomy. My rule: prefer fakes over mocks, always.
The Honest Metric
The metric is not coverage. It is mean-time-to-reproduce when a bug is reported. If I can write a failing test in under fifteen minutes, my test infrastructure is healthy. If it takes me an hour, I have friction to remove.
Tests are infrastructure. Maintain them like infrastructure.
Deleting Tests Is a Practice Too
Tests that have not failed in a year on stable code are noise, not safety. Once a quarter, I review test execution stats and delete tests that have neither caught a regression nor covered new code. Deleting tests is counterintuitive, but a test suite that grows forever becomes a slower deploy pipeline and a friction tax on every refactor. Keep the suite lean; the ones that remain earn their keep.
RELATED READING
The Consulting Shift I Am Making In Year Two
After a year of writing and building, my consulting practice is changing shape. Shorter engagements. Sharper outcomes.
ReadThe Frontend Shift: Shipping Less JavaScript In Year Two
A year ago I reached for Next.js for everything. This year I often reach for nothing.
ReadThe Serverless Lesson I Would Write On A Sticky Note
After a year of shipping serverless projects, one rule explains most of the wins and all of the losses.
Read