Don't accept slop on the first pass

May 27, 2026·4 min read

#claude-code#ai#developer-tools#practice

The AI's summary of what it did is not the same as what it did.

Left: an AI summary saying "Added input validation and improved error handling for the user registration flow" — confident, plausible, four lines. Right: the actual diff — 60 lines of try/catch wrapping internal calls, three pieces of validation for arguments that come from already-validated boundaries, and a comment explaining that the change is safe.

Summary vs diff

The summary is a confident retelling. It's often correct. It's sometimes wrong about exactly the parts that matter — the unrequested refactor, the speculative validation, the comment block where a rename would do.

The summary feels like the artifact. The diff is the artifact. Six months from now, the summary will be forgotten and the code will still be there, shaping the next person's reading.

Three tells of slop

The patterns are predictable enough to recognize in seconds:

The shape is wrong. You asked for a fix, you got a refactor. You asked for a rename, you got a new abstraction. You asked for a comment, you got a wrapper function plus a test for the wrapper. The diff "works." It's not what you wanted.
Defensive code for cases that can't happen. Try/catch around internal function calls. Argument validation for parameters coming from your own code. Fallback paths nothing will ever hit. The vibe is I'm being careful. The reality is dead code that future readers will treat as load-bearing.
Tests that test the mock, not the behavior. The mock returns { ok: true }. The test asserts the result is { ok: true }. If the implementation under test were replaced with return null, the test would still pass — because the mock is doing all the work. This isn't a test. It's a placebo.

The thing all three have in common: they look productive in the summary, and they're dead weight in the diff.

Why the summary is marketing

The summary is generated by the model right after it acts, summarizing its own work. Models trained to be helpful tend to describe their work in helpful-sounding terms. "Added validation" is the kind of phrase that sounds like a step forward. "Wrote 60 lines of dead defensive code" never appears in a summary, even when it's the more accurate description.

This isn't dishonesty. It's the natural shape of self-narration. We do the same thing on our own PRs.

The corrective is mechanical: don't accept the summary as the review. Read the diff. Every line.

The 30-second habit

For diffs under 200 lines, the review pass is:

Scan the diff. Read every changed line. Not a summary, not a folded view — the lines themselves.
For each block, ask: would I have written this?
Reject any line that fails the test. Either delete it yourself or ask for a tighter pass.

For larger diffs, the habit is the same but harder — and exactly where slop hides best. Larger diffs are where the four-line cleanup turns into a 200-line refactor.

If a diff is "too long to review," the right answer is "then it's too long to merge." Ask for it broken up, or scoped down.

When to relax it

I don't apply this on:

Throwaway scripts I'll run once and delete.
Generated boilerplate I've explicitly asked for (a migration scaffold, a test fixture).
The output of formatters or linters.

Everywhere else, the diff goes through the 30-second pass before it gets staged.

The rule

The summary is marketing. The diff is the product. The review you owe is to the diff.

Slop accumulates. By the time you notice the wrapper class nobody can name, it's been imported by twelve files. The 30 seconds you didn't spend reading turns into the 30 minutes you spend deleting.

Six months later, nobody remembers the summary. The code is what's left.