Letting Playwright take screenshots made my coding agent 10x better at fixing UI breakage

TL;DR

A coding agent can read your DOM and code, but it never actually sees the rendered page. So it misses visual breakage.
I set up Playwright to do one thing — take screenshots of given pages — and let the agent capture, inspect, and fix in a loop.
The moment "change it → take a shot → judge it yourself" started spinning, fixing UI breakage got dramatically easier. Felt like 10x.
The point isn't a fancy E2E suite. It's giving the agent a feedback loop it can look at on its own.

Fixing UI breakage is the painful part

Getting a coding agent to write ordinary logic has gotten pretty stable.
With tests in place, the agent can drive red to green on its own.

But UI breakage was different.

A button overflows ever so slightly
Text dissolves into the background, but only in dark mode
The layout splits into two rows at a certain width
gap doesn't apply and elements stick together

This kind of bug usually can't be caught by reading code.
And the nasty part: tell the agent to "fix it," and it can't verify whether it's actually fixed.

So I'd end up opening a browser, looking with my eyes, and bouncing it back with "still broken."
That rally was eating most of the time.

The agent isn't looking at the screen

Obvious in hindsight: what the agent sees is basically text.

Source code
DOM structure
Class names

So it knows "this div has these Tailwind classes."
But it never sees how that actually renders.

When a human fixes frontend code, they're almost always looking at the screen.
And yet I wasn't handing the agent a screen at all. That was the root of the mismatch.

What I did: hand it a way to take screenshots

I didn't want to build a grand E2E suite.
I wanted the minimum: "take a screenshot of a given page and write it to a file."

Add Playwright.

npm install -D @playwright/test
npx playwright install chromium

Drop in a spec that just lists targets and loops over them.

// e2e/screenshots.spec.ts
import { test, expect } from '@playwright/test';

const pages = [
  { name: 'home', path: '/' },
  { name: 'blog', path: '/blog' },
  { name: 'blog-post', path: '/blog/knip-dead-code-cleanup' },
  { name: 'projects', path: '/projects' },
];

const themes = ['light', 'dark'] as const;

for (const { name, path } of pages) {
  for (const theme of themes) {
    test(`screenshot ${name} (${theme})`, async ({ page }) => {
      await page.emulateMedia({ colorScheme: theme });
      await page.goto(path);
      await page.waitForLoadState('networkidle');
      await page.screenshot({
        path: `e2e/__shots__/${name}-${theme}.png`,
        fullPage: true,
      });
    });
  }
}

Add a command to package.json.

{
  "scripts": {
    "shots": "playwright test e2e/screenshots.spec.ts"
  }
}

Now npm run shots lines up light/dark screenshots of each page under e2e/__shots__/.

Hand the loop to the agent

This is the real point: taking the screenshots alone does nothing.
You have to show the captured images back to the agent and let it judge them.

Modern coding agents (like Claude Code) can read image files.
So I shaped the instructions like this.

Change the UI
Run npm run shots
Open the resulting e2e/__shots__/*.png and inspect them yourself
If something's broken, fix it and go back to 2
When nothing's broken, you're done

That alone flipped the behavior.

The agent would notice on its own — "the button is overflowing to the right" — add a max-width, shoot again, and report "fixed."
The visual-check rally a human used to run now closed up inside the loop.

Why it felt like 10x

The big win was fewer bounce-backs.

Before, every time I said "fix it," I had to open a browser and check.
One fix required one human eyeball pass.

After handing over the screenshot loop, the agent does several round trips by itself before bringing me "done."
Human verification dropped to a single pass at the end.

Nice side effects.

Easy-to-miss issues like dark-mode breakage get caught early
"It breaks at this width" gets preserved as a screenshot with its repro condition
Easy to paste before/after images during review
Showing an image beats explaining "center this" in prose

For the agent and for me, images turned out to be the strongest shared language.

Where it bit me

It's not magic, so a few caveats.

Timing. Shoot mid-animation or mid-font-load and you get flaky frames every time. Use waitForLoadState('networkidle'), and add explicit waits when needed.
Some diffs are better judged by a human. The agent is good at "is it broken," but "is it good design" is a different question. The last look is human.
Don't over-capture. Shooting every page × width × theme is slow, and too many images means you miss things. Start with the few that break easily.
You often don't need pixel-perfect comparison. Strict toHaveScreenshot() diffing is handy, but "shoot and eyeball it" is plenty effective to begin with.

Wrap-up

The reason fixing UI breakage hurt was that the agent wasn't looking at the screen.

You don't need a grand E2E setup. Just hand it the minimal loop: take a screenshot, show it back to the agent.
That alone closed the fix cycle inside the agent and made it feel 10x easier.

If you're leaning on an agent for frontend and struggling with breakage, it's worth starting with a single Playwright shot.