I Spent 3 Weeks Letting AI Rebuild Our Design System. The Lesson Cost Me More Than I Expected.

AI Design System Field Note

I Spent 3 Weeks Letting AI Rebuild Our Design System. The Lesson Cost Me More Than I Expected.

May 30, 2026

components audited

84%

component import rate

28.8%

token adoption baseline

I spent three weeks letting AI rebuild our design system from zero. The mistake I made is the same one many design teams are about to make right now.
I’m glad I made it first. Here’s what happened.

On paper, our design system looked healthy: 41 components, an 84% import rate, and a team that knew how to use it.
But token adoption was only 28.8%. Most of the codebase still relied on hand-typed colors, spacing, and sizing, screen by screen.
So I asked: what if AI built the new version from scratch? Faster, cleaner, no legacy debt. I was wrong.

The mistake

Visual consistency collapsed with every iteration.
Fixing one component broke ones we’d already finished.
We got trapped in an infinite revision loop.
Old code and new code collided in production.

The root cause wasn’t AI. It was the absence of a visual ground truth. Without a Figma source of truth, AI had nothing stable to check itself against — and non-developers had no reliable way to verify whether the generated code was production-ready.
So we rebuilt the workflow around one rule:
Design sets the ground truth. AI follows it. Developers own the code.
We also found a quieter problem: designers and developers had different mental models of the same system. That gap is what users were feeling when they said, “The UI changes too often. It’s confusing to use.”

The fix was a Design System Guardian Skill: a workflow step that forces AI to check design principles before touching text, style, components, or logic. Not as a courtesy. As structure.

Result

Token adoption: 28.8% → +61%.
Hard-coding: −76%.
Handoff time: 30 minutes → 10 minutes per screen.

The lesson I’d give any founder building with AI right now: AI doesn’t fail because it is not capable. It fails when there is no ground truth to check against. Fix the ground truth first.