Technical
Testing in the Agent Era: What Actually Changed
Two years in, testing feels different than it used to. Not because the tools changed, but because agents generate code at a rate that used to require a team. The bottleneck is no longer writing tests. It is deciding which tests matter. Here is how my approach shifted.
The Old Bottleneck
Pre-agent era, writing tests took as long as writing the code. Most consultants skipped tests because budgets did not cover double the time. Tests lived on a wishlist nobody got to.
The New Bottleneck
Agents write tests for free. Generating fifty test cases takes a prompt. The question flipped: with unlimited tests, which tests actually matter?
The Test Tiering I Use Now
- Tier 1: contract tests for public APIs (always)
- Tier 2: integration tests for critical paths (always)
- Tier 3: unit tests for pure logic (when free to generate)
- Tier 4: exhaustive edge cases (only on modules touched by finance, auth, or billing)
Agents generate Tier 3 and 4 on demand. I write Tier 1 and 2 myself or review them carefully.
What Changed in My Testing Discipline
- I read tests more than I write them (agents write, I verify)
- I delete tests that do not encode intent, even if they pass
- I value coverage less and critical-path verification more
- I expect red first always, regardless of who writes the test
The Red-First Rule
Every test must fail at least once before passing. Non-negotiable, whether I or an agent wrote it. Tests that have never been red are not tests. They are assertions that happen to be true right now.
Example Failure Mode
I caught an agent generating a test like this:
def test_get_user():
user = get_user('x')
assert user is not None or user is NonePasses. Tests nothing. The agent had filled a slot without thinking. Red-first would have caught it immediately because the assertion cannot fail.
The Coverage Myth
Ninety percent coverage used to be a goal. Now it is a trivially achievable metric that tells you almost nothing. I have seen agent-generated test suites with 95 percent coverage that test almost no intent. Coverage is a floor, not a ceiling, and the ceiling is the harder question.
What I Tell Teams
Spend your test budget on the tests that encode business rules. Let agents generate everything else. The old ratio of time-on-tests to time-on-code is obsolete. The new ratio is time-on-intent to time-on-generation, and intent is where the thinking lives.
See Kent Beck's writing on test desiderata for the underlying frame I still rely on.
RELATED READING
The Consulting Shift I Am Making In Year Two
After a year of writing and building, my consulting practice is changing shape. Shorter engagements. Sharper outcomes.
ReadThe Frontend Shift: Shipping Less JavaScript In Year Two
A year ago I reached for Next.js for everything. This year I often reach for nothing.
ReadThe Serverless Lesson I Would Write On A Sticky Note
After a year of shipping serverless projects, one rule explains most of the wins and all of the losses.
Read