Clarity Design System — Case Study

Clarity

A code-first, AI-native design system that makes wrong UI impossible — built, documented, and running in production.

My role: Design systems lead — owned strategy, token architecture, component API design, build tooling, governance, and docs
Context: A global B2B jewellery marketplace (anonymized)
Duration: ~3 months, foundations to ~63 components
Stack: W3C DTCG tokens (OKLCH) · shadcn/ui + Radix · Tailwind v4 · Nx monorepo · Fumadocs · Storybook
What shipped: Token pipeline · component library · live validation app · agent-readable docs · governance model · 4 ADRs

A look at how I think about design systems end-to-end — the diagnosis, the architecture, the judgment calls, and how I set up the system (and the team) to own it after me.

The problem

A system that suggested, but didn't enforce

The client had a working design system — Material UI, a hand-rolled token pipeline, 60+ components. On paper, fine. In practice, it leaked design intent at every seam:

per-component override files fighting Material's defaults

723

sx={} escapes — each a place a token could be ignored

component-specific token files to govern by hand

The team was paying, continuously, to make a Material system not look like Material. Responsiveness was opt-in, bolted on per component rather than encoded in the contract. And the cost showed up as a loop that dominates UI delivery almost everywhere:

The loop exists because "correctly implemented" was never the same thing as "design-correct."

Industry research puts a number on this kind of friction — 66% of teams lose 25–50% of their time to design-delivery inefficiency, an estimated ~$298k/year in lost productivity per product pod. (Builder.io — cited as industry context, not a client-measured result.)

The thesis

The components are the design

Not "the components follow the design." The components encode it — tokens, spacing, type, accessibility, responsiveness, and the legal range of variants — inside the API. The guard-rails live in the prop signatures, not in a Figma file or a reviewer's eye.

The consequence is the whole point: there is no path by which an engineer builds something "correctly" that is also "design-incorrect." Design stops reviewing implementations because there's nothing left to catch. The QA loop doesn't get faster — it gets deleted.

Same work, with the review loop removed — because correctness is structural, not reviewed.

A second thesis followed from where the industry is going: if the components are the source of truth, that source should be legible to machines, not just people. Engineers, designers, and PMs increasingly build through AI coding agents. They should all consume the same library and inherit the same guarantee — so the system was designed AI-native from the foundation. The repo itself is the API; the documentation is the interface an agent reads.

How I approached it

Four phases, foundations to library

1 · Diagnose before building

I didn't start with components. I started with a forensic audit of the existing system, because the fastest way to lose a design-system rebuild is to recreate the old one with nicer syntax. The audit produced the numbers above and — critically — a token-by-token comparison against what was actually shipping in production.

That de-risked everything downstream: I made 15 explicit token-alignment decisions so the new system's values matched production. Migration could then happen incrementally with zero visual drift.

What I believe: evidence first. A one-page diagnostic with real counts wins the room and tells you exactly what to delete.

2 · Tokens as the contract

Source of truth: W3C DTCG JSON in Git — vendor-neutral, spec-backed, future-proof against Figma's native token import.
Two tiers, deliberately: primitives (raw OKLCH, e.g. color.azure.500) and semantic role tokens (action.primary, feedback.error). Component values live in variants, not a third tier — which kept the governance surface small.
OKLCH colour for perceptually-even ramps and predictable dark mode.
One source, every platform: compiles to web CSS, a shadcn theme, JS/TS for React Native, inline email values, and backend JSON. Semantic tokens emit var(--…) chains, so a primitive can be retargeted without recompiling consumers.

One token source compiles to every platform. I replaced Style Dictionary with a ~230-line bespoke build once the vendor cost more than it saved (ADR-001).

What I believe: semantic tokens with intent in the name (color-feedback-error, never blue-5) are the single highest-leverage thing you can do — for human clarity and for AI output quality.

3 · Components on headless primitives

The library is built the way the best teams build in 2026 — Radix headless primitives + Tailwind + code-ownership (shadcn distribution), the same foundation OpenAI, Vercel, Linear, and Supabase run in production. No design language is inherited from a vendor; nothing has to be overridden back out.

Scope reached ~63 components across four atomic tiers — 38 atoms, 16 molecules, organisms (FilterToolbar, Megamenu, Sidebar) and page templates. I hardened one reference component (Button) to define what "finished" means, so the rest of the library had a quality bar to seed against and harden through real use. Crucially, the library was validated against a live test application wired to real data — a component isn't done because it renders in isolation; it's done when it survives a real product surface.

4 · Documentation that is the interface

A top-level library map — a for/not-for index of every component, designed to be the first thing an agent reads.
63 colocated docs — props, usage, do/don't — living next to the code so they never drift.
TypeScript contracts as the spec, Storybook stories as executable examples, the Nx graph as machine-readable structure.

Because the docs are the repo, an AI agent always reads the current truth. No sync step, no stale mirror.

Judgment

Decisions, recorded as ADRs

A design system is mostly a series of "do we adopt this tool?" calls. I recorded four — the reasoning is the artifact, not the choice:

Bespoke token build over Style Dictionary — adopt vendor tooling until it costs more than it saves; then own a small, tested script. Reversible: the DTCG source never changed.
Removed visual-regression testing — right tool, wrong time. With one finished component and no CI, it was pure overhead (and a live secret-exposure risk). Re-evaluate at 10+ components.
Rejected Figma↔code token sync — bidirectional sync solves a problem this team didn't have. Designers explore in Figma; they don't author tokens there.
Markdown docs over a hosted hub — when the source is already agent-readable, the docs site should read it directly rather than maintain a polished copy.

The throughline: adopt the minimum tooling the stage justifies, document why, keep it reversible. That discipline is most of what keeps a young design system from collapsing under its own process.

How I worked

What a system like this lives or dies on

Beyond the code, the things that decide whether a design system survives:

End-to-end ownership. I led this from strategy through token architecture, component APIs, the build pipeline, governance, and docs — not a slice handed off, but the whole spine.
Designed for the handover, not for me. Two token tiers, a tested ~230-line build a non-engineer can read, decisions captured as ADRs, docs that live in the repo. The point was a system the team could own and extend after I stepped out — not a black box only I understand.
De-risked the politics, not just the code. Pre-aligning tokens to production values meant migration was invisible to users and uncontroversial to leadership — adoption didn't need a flag day or a re-skin everyone would argue about.
Set the quality bar by example. One reference component, fully finished, defined "done" for everyone (and every agent) that came after it.
Worked with how designers actually work. Designers explore in Figma; I made the code the source of truth without forcing them to author tokens in a tool they don't live in.

Outcomes

What changed

The QA loop is structurally gone, not optimized. Engineers import components that are the design; design has nothing to catch.
Design hours redirected from reviewing implementations to building components, documentation, and governance.
Zero-drift migration path — new values pre-aligned to production, so adoption is incremental and invisible to users.
Self-service for non-engineers — designers and PMs build correct UI directly through AI agents, against the same library, with the same guarantee.
A clean, reviewable Git history — phased, conventionally-committed, decisions recorded. Auditable, not a black box.
A small, ownable governance surface — two token tiers and component-encoded variants instead of 32 token files and 81 override sheets.

If you can build it, it's already design-correct.

Stack at a glance

W3C DTCG tokens OKLCH shadcn/ui Radix UI Tailwind v4 Nx monorepo Fumadocs Storybook Vitest TypeScript Claude Code / Cursor native