Delible — Case Study · Chris Learey

Delible

Describe a screen; get four divergent, hand-drawn takes on it — pictured: the real picker, one prompt solved four ways.

My role: Founder & sole builder — product strategy, the design system, the AI generation pipeline, the web app, and the MCP server
Context: A standalone product, originated as an internal tool ("Nivoda Sketch") to get a B2B marketplace's PMs off Lovable
Duration: Internal v1 to shipped SaaS — live at app.delible.dev
Stack: Next.js 15 · React 19 · Supabase · Claude API · rough.js + blackchalk · MCP server · Stripe
What shipped: Live web app · open-source component library (~70 components) · 4-tool MCP server · permalink publishing · markdown + PDF brief export · eval harness

A blank-repo-to-shipped product: the thesis, the constraint system that enforces it, the AI pipeline I had to make trustworthy, and the data layer that turns a disposable sketch into something a team can hand off.

The problem

Polish arrives before the idea is ready

The dominant way teams prototype now is to describe a screen to an AI and get back something that looks finished. That feels like progress. It isn't — it's the most expensive failure mode in early product work, and it shows up two ways:

Premature polish skews the feedback. When a prototype looks done, stakeholders critique the gradient, not the flow. A polished mock whispers "this is decided" whether or not it is — so the structural questions that actually matter early never get asked.
Context is disposable. The incumbent (Lovable) spins up a working prototype fast, then discards the why the moment you export. Engineering inherits code without story; design inherits a screen without intent. Every handoff is a context reset.

The deeper issue is that fidelity and confidence come uncoupled. Confidence in a solution builds slowly — through feedback, iteration, killed ideas. But AI tools jump straight to maximum fidelity on the first prompt. The gap between the two is the premature-polish tax:

Fidelity should track confidence. AI tools decouple them — and everything in the gap is feedback aimed at the wrong layer.

Delible's origin makes the pain concrete: it began as "Nivoda Sketch," an internal tool with one disarmingly specific success metric — "PMs stop using Lovable." A real first customer, a real measured pain, validated inside a company before it became a product.

The thesis

Make polish impossible, not discouraged

Every other lo-fi tool relies on convention — a style guide, a reviewer's restraint, a "please keep it rough." Conventions leak. So I built the constraint into the rendering layer: there is no path by which Delible produces something that looks finished.

Everything renders hand-drawn (rough.js wobble, no straight machine edges), strictly monochrome, with border-radius, box-shadow, gradients, and decorative transitions banned at the system level. You can't polish your way out of the sketch — so the conversation stays on structure, flow, and hierarchy, which is exactly where early-stage work needs it.

Constrain the output and fidelity can only rise as confidence does. The manifesto line: look as done as you actually are.

That's the wedge. But the constraint isn't the product — it's the price of admission. The real product is the layer underneath: while you sketch, Delible quietly builds the brief — a versioned record of decisions, rationale, and open questions — and mints a shareable permalink that outlives the AI session. The loop is four moves: type → see → steer → hand off. Engineering gets the story, not a context reset.

How I built it

Four layers, constraint to handoff

1 · A design system that can't be polished

The aesthetic is the enforcement mechanism, so it had to be a real system, not a CSS theme. I built blackchalk — a monochrome, rough.js-based React kit of ~70 components — and published it open-source. Stroke weight is governed by exactly two tokens (surfaces recede, interactive elements come forward); colour is greyscale-only; the banned-polish list lives in code, not a wiki.

What I believe: a constraint you can route around isn't a constraint. I wired a monochrome lint that fails the build on any non-zero-saturation colour — so "you can't polish it" is enforced by CI, the same way Clarity makes wrong UI uncompilable.

2 · AI generation you can trust

Rules as the system prompt: the canonical design rules (RULES.md) are the single source of truth and ship into the generator's prompt — one rulebook for humans, the web app, and the MCP server.
A validator as a guardrail: generated JSX is checked against the same rules production enforces, so off-system output never reaches the canvas.
An eval harness, not vibes: a fixed prompt set across core archetypes, stretch domains, and edge cases runs offline and scores pass/fail against the validator — currently 42/43 (97.7%).
Divergent options, then refine: a run returns several genuinely different takes on the one screen (up to four, A–D), not a single answer — then you iterate the keepers down to a direction. Compare-and-choose is itself a fidelity guardrail.

One prompt fans out to four divergent options. You refine the promising ones — A four times, C twice — then hand off the single chosen direction to design and engineering. The rest are set aside, not deleted.

3 · The context layer — the actual moat

Anyone can clone a hand-drawn look in a weekend; rough.js is open. What's hard to copy is the data layer. As you iterate, Delible captures a versioned markdown brief — problem, decisions, rationale, open questions — exportable as markdown and PDF, and publishes a permalink (a real /api/publish endpoint minting a public slug) that lives in Delible's own database. The sketch escapes the gravity of any single AI session: stakeholders who never opened the tool can view it, and a future session can reopen it with context intact.

What I believe: the moat is the database and the brief, not the brushstrokes. Building a sketch tool is easy; building a permalink-routed sketch database with a versioned decision log is a different company.

4 · Two surfaces, one sovereign data layer

Delible ships on two surfaces that share one backend. The web app (app.delible.dev, with Google sign-in and Stripe billing) is the durable home where accounts, data, and permalinks live. The MCP server — four production tools, create_sketch / iterate_sketch / list_sketches / publish_sketch — puts the same product inside Claude as a distribution channel. The data layer stays Delible-owned regardless of surface, so platform risk never touches the moat.

Judgment

The decisions that define it

A product like this is a series of opinionated calls — and the opinion is the artifact:

Monochrome, enforced by CI. Greyscale-only isn't a guideline; it's a build failure. The constraint has to be unroutable or it isn't a constraint.
Divergent options, then refine — never one-shot. A single output invites "make this one perfect." Several divergent sketches (A–D) keep the user comparing structures, then refining the keepers — exactly the disposable mindset the tool is for.
Extracted blackchalk to open source. The library is a credibility signal and a top-of-funnel channel for the PM-native audience — a soft moat and a funnel, deliberately not the business.
Positioned against Claude Design, on purpose. Anthropic validated "talk to an AI, get a prototype" — and aimed at polished, on-brand, hi-fi. That leaves the deliberately-disposable lo-fi lane open. "Claude Design makes it look done; Delible makes it look exactly as done as it is."

The throughline: pick the sharp, narrow opinion and build it into the system — a wedge a horizontal platform won't copy because it contradicts their own bet.

How I worked

Blank repo to shipped, solo

Beyond the code, what this build demonstrates:

End-to-end ownership. Strategy, the design system, the AI pipeline, the web app, the MCP server, billing — I took it from an empty repo to a deployed, monetised product on my own.
Design judgment encoded as engineering. The taste — what "low fidelity" actually means — lives in tokens, a banned-polish list, and a lint rule, not in a deck. The opinion is executable.
Validated before generalised. It earned its keep as an internal tool with a measurable goal before it became a SaaS — product-market fit evidence first, platform second.
Quality instrumented, not asserted. An eval harness and a production validator mean "the AI output is good" is a number I can re-run, not a claim.
Built for the moat, not the demo. The flashy part is the sketch; the defensible part is the brief and the permalink database — and I prioritised the latter.

Where it stands

What shipped

Live and monetised — deployed at app.delible.dev with Google auth and Stripe billing wired.
The constraint holds structurally — monochrome enforced by lint, polish banned in code; the output can't drift hi-fi.
Generation is trustworthy and measured — 42/43 eval pass rate against the same validator production runs.
The permalink is real — published sketches mint a public slug and live independently of any AI session.
Two surfaces, one backend — a standalone web app as the home, a four-tool MCP server as the distribution channel into Claude.
An open-source asset — blackchalk (~70 components) doing double duty as credibility and funnel.

Look as done as you actually are.

Stack at a glance

Next.js 15 React 19 Supabase Claude API rough.js blackchalk (OSS) MCP server Stripe Tailwind TypeScript Eval harness