Chris Learey/Selected work/Case study

AI-Native Product

A tool that refuses to look finished

Describe a screen; get a hand-drawn sketch you're happy to throw away. Delible is an AI-native, low-fidelity prototyping tool that keeps product teams arguing about the idea — not the pixels — and hands engineering the story, not just the screen.

Most prototyping tools race to look done. I built this one to look exactly as done as the thinking actually is. Fidelity is a function of confidence — and the surest way to protect that is to make polish structurally impossible, not merely discouraged.
Exhibit 01 — Live product

Delible

Describe a screen; get four divergent, hand-drawn takes on it — pictured: the real picker, one prompt solved four ways.

The Delible picker — four divergent, hand-drawn ‘Diamond Recommendation Tool’ sketches generated from one prompt, shown in a 2×2 grid to pick one to iterate
delible.dev Open the live product
My role
Founder & sole builder — product strategy, the design system, the AI generation pipeline, the web app, and the MCP server
Context
A standalone product, originated as an internal tool ("Nivoda Sketch") to get a B2B marketplace's PMs off Lovable
Duration
Internal v1 to shipped SaaS — live at app.delible.dev
Stack
Next.js 15 · React 19 · Supabase · Claude API · rough.js + blackchalk · MCP server · Stripe
What shipped
Live web app · open-source component library (~70 components) · 4-tool MCP server · permalink publishing · markdown + PDF brief export · eval harness
A blank-repo-to-shipped product: the thesis, the constraint system that enforces it, the AI pipeline I had to make trustworthy, and the data layer that turns a disposable sketch into something a team can hand off.

The problem

Polish arrives before the idea is ready

The dominant way teams prototype now is to describe a screen to an AI and get back something that looks finished. That feels like progress. It isn't — it's the most expensive failure mode in early product work, and it shows up two ways:

  • Premature polish skews the feedback. When a prototype looks done, stakeholders critique the gradient, not the flow. A polished mock whispers "this is decided" whether or not it is — so the structural questions that actually matter early never get asked.
  • Context is disposable. The incumbent (Lovable) spins up a working prototype fast, then discards the why the moment you export. Engineering inherits code without story; design inherits a screen without intent. Every handoff is a context reset.

The deeper issue is that fidelity and confidence come uncoupled. Confidence in a solution builds slowly — through feedback, iteration, killed ideas. But AI tools jump straight to maximum fidelity on the first prompt. The gap between the two is the premature-polish tax:

high time → fidelity (AI tools) confidence (reality) the premature-polish tax
Fidelity should track confidence. AI tools decouple them — and everything in the gap is feedback aimed at the wrong layer.
Delible's origin makes the pain concrete: it began as "Nivoda Sketch," an internal tool with one disarmingly specific success metric — "PMs stop using Lovable." A real first customer, a real measured pain, validated inside a company before it became a product.

The thesis

Make polish impossible, not discouraged

Every other lo-fi tool relies on convention — a style guide, a reviewer's restraint, a "please keep it rough." Conventions leak. So I built the constraint into the rendering layer: there is no path by which Delible produces something that looks finished.

Everything renders hand-drawn (rough.js wobble, no straight machine edges), strictly monochrome, with border-radius, box-shadow, gradients, and decorative transitions banned at the system level. You can't polish your way out of the sketch — so the conversation stays on structure, flow, and hierarchy, which is exactly where early-stage work needs it.

high time → fidelity = confidence no gap to pay for
Constrain the output and fidelity can only rise as confidence does. The manifesto line: look as done as you actually are.

That's the wedge. But the constraint isn't the product — it's the price of admission. The real product is the layer underneath: while you sketch, Delible quietly builds the brief — a versioned record of decisions, rationale, and open questions — and mints a shareable permalink that outlives the AI session. The loop is four moves: type → see → steer → hand off. Engineering gets the story, not a context reset.

How I built it

Four layers, constraint to handoff

1 · A design system that can't be polished

The aesthetic is the enforcement mechanism, so it had to be a real system, not a CSS theme. I built blackchalk — a monochrome, rough.js-based React kit of ~70 components — and published it open-source. Stroke weight is governed by exactly two tokens (surfaces recede, interactive elements come forward); colour is greyscale-only; the banned-polish list lives in code, not a wiki.

What I believe: a constraint you can route around isn't a constraint. I wired a monochrome lint that fails the build on any non-zero-saturation colour — so "you can't polish it" is enforced by CI, the same way Clarity makes wrong UI uncompilable.

2 · AI generation you can trust

  • Rules as the system prompt: the canonical design rules (RULES.md) are the single source of truth and ship into the generator's prompt — one rulebook for humans, the web app, and the MCP server.
  • A validator as a guardrail: generated JSX is checked against the same rules production enforces, so off-system output never reaches the canvas.
  • An eval harness, not vibes: a fixed prompt set across core archetypes, stretch domains, and edge cases runs offline and scores pass/fail against the validator — currently 42/43 (97.7%).
  • Divergent options, then refine: a run returns several genuinely different takes on the one screen (up to four, A–D), not a single answer — then you iterate the keepers down to a direction. Compare-and-choose is itself a fidelity guardrail.
Prompt “a diamond search” Claude + RULES.md system prompt · validator Option A Option B Option C Option D set aside set aside handoff refine ×4 set aside refine ×2
One prompt fans out to four divergent options. You refine the promising ones — A four times, C twice — then hand off the single chosen direction to design and engineering. The rest are set aside, not deleted.

3 · The context layer — the actual moat

Anyone can clone a hand-drawn look in a weekend; rough.js is open. What's hard to copy is the data layer. As you iterate, Delible captures a versioned markdown brief — problem, decisions, rationale, open questions — exportable as markdown and PDF, and publishes a permalink (a real /api/publish endpoint minting a public slug) that lives in Delible's own database. The sketch escapes the gravity of any single AI session: stakeholders who never opened the tool can view it, and a future session can reopen it with context intact.

What I believe: the moat is the database and the brief, not the brushstrokes. Building a sketch tool is easy; building a permalink-routed sketch database with a versioned decision log is a different company.

4 · Two surfaces, one sovereign data layer

Delible ships on two surfaces that share one backend. The web app (app.delible.dev, with Google sign-in and Stripe billing) is the durable home where accounts, data, and permalinks live. The MCP server — four production tools, create_sketch / iterate_sketch / list_sketches / publish_sketch — puts the same product inside Claude as a distribution channel. The data layer stays Delible-owned regardless of surface, so platform risk never touches the moat.

Judgment

The decisions that define it

A product like this is a series of opinionated calls — and the opinion is the artifact:

  • Monochrome, enforced by CI. Greyscale-only isn't a guideline; it's a build failure. The constraint has to be unroutable or it isn't a constraint.
  • Divergent options, then refine — never one-shot. A single output invites "make this one perfect." Several divergent sketches (A–D) keep the user comparing structures, then refining the keepers — exactly the disposable mindset the tool is for.
  • Extracted blackchalk to open source. The library is a credibility signal and a top-of-funnel channel for the PM-native audience — a soft moat and a funnel, deliberately not the business.
  • Positioned against Claude Design, on purpose. Anthropic validated "talk to an AI, get a prototype" — and aimed at polished, on-brand, hi-fi. That leaves the deliberately-disposable lo-fi lane open. "Claude Design makes it look done; Delible makes it look exactly as done as it is."
The throughline: pick the sharp, narrow opinion and build it into the system — a wedge a horizontal platform won't copy because it contradicts their own bet.

How I worked

Blank repo to shipped, solo

Beyond the code, what this build demonstrates:

  • End-to-end ownership. Strategy, the design system, the AI pipeline, the web app, the MCP server, billing — I took it from an empty repo to a deployed, monetised product on my own.
  • Design judgment encoded as engineering. The taste — what "low fidelity" actually means — lives in tokens, a banned-polish list, and a lint rule, not in a deck. The opinion is executable.
  • Validated before generalised. It earned its keep as an internal tool with a measurable goal before it became a SaaS — product-market fit evidence first, platform second.
  • Quality instrumented, not asserted. An eval harness and a production validator mean "the AI output is good" is a number I can re-run, not a claim.
  • Built for the moat, not the demo. The flashy part is the sketch; the defensible part is the brief and the permalink database — and I prioritised the latter.

Where it stands

What shipped

  • Live and monetised — deployed at app.delible.dev with Google auth and Stripe billing wired.
  • The constraint holds structurally — monochrome enforced by lint, polish banned in code; the output can't drift hi-fi.
  • Generation is trustworthy and measured — 42/43 eval pass rate against the same validator production runs.
  • The permalink is real — published sketches mint a public slug and live independently of any AI session.
  • Two surfaces, one backend — a standalone web app as the home, a four-tool MCP server as the distribution channel into Claude.
  • An open-source assetblackchalk (~70 components) doing double duty as credibility and funnel.

Look as done as you actually are.

Stack at a glance

Next.js 15 React 19 Supabase Claude API rough.js blackchalk (OSS) MCP server Stripe Tailwind TypeScript Eval harness