html-builder-v7 — twin-driven mock → HTML

The core idea

An image model can dream up beautiful, arbitrary design that CSS templates can't. But an image is just pixels — no text, no structure, no SEO. v7 keeps the beauty and puts the meaning back: it locates every known string in the picture, erases the baked text to a clean “plate,” and re-renders the words as real HTML on top. Because we already know every string from the twin, we never guess what's text — we only find where it is.

DATA

Twin + Mock

YAML twin (every string, by role) + the rendered mock image.

STEP 1

Localize

Find each known string's pixel box — EasyOCR, auto-escalating to Gemini vision for script fonts.

STEP 2

Erase

Photo-aware inpaint → a “clean plate” with text gone but imagery intact.

STEP 3a

Overlay

Real HTML text positioned over the plate — pixel-faithful to the mock.

STEP 3b

Semantic

Same visuals as crawlable HTML: h1/section/nav, lazy slices, img alt, JSON-LD.

Why it works — the load-bearing decisions

Each of these was a real problem we hit and solved; together they're the logic.

Twin-anchored localization. We assign each OCR/vision box to a known twin string and drop everything else. Junk (logo flourishes, charred pizza crust mistaken for glyphs) can't survive because it matches no known string. This is the single biggest robustness win.
Engine auto-escalation. EasyOCR is fast and free but fragments on script/display fonts. So we try OCR, measure coverage, and escalate to Gemini vision (forced JSON output + retry-keep-best + a “missing sweep”) only when needed. Result: 100% localization on all 8 sites.
Photo-aware erase. On flat panels we inpaint the whole text rect (ghost-free even for script fonts); over a photo or illustration we erase only the glyph strokes so the image stays crisp. Halftone/pattern bands (pop-art) are detected by hue-uniformity and treated as flat.
Boxed-component detection. Pills, badges and buttons (a colour distinct from their surroundings) are kept baked — erasing them would flatten the styling. Plain text is erased and re-rendered. Calibrated cleanly: real pills score 100–500 colour-distance, headings 0–7.
Legibility guard. Text colour is sampled from the mock, then flipped light/dark if it wouldn't contrast the clean plate it actually sits on — so erased pill backgrounds never leave invisible text.
Measured, not eyeballed. A Gemini “fidelity gate” scores each rebuild 0–10 against the mock and lists concrete diffs — every change is verified, not assumed.

Two outputs, one pipeline

✏️ Overlay — pixel-faithful

Real HTML text absolutely positioned over the clean-plate picture. Reproduces any aesthetic the mock shows, exactly. Best when visual fidelity is the goal. Trade-off: it's a picture with text on top — weak for SEO and it scales rather than reflows.

🔎 Semantic — SEO / accessible

Same visuals, emitted as <header>/<main>/<section>, one <h1> then <h2>/<h3>, <nav>; per-section background slices that lazy-load (better LCP); real photos as <img alt>; CTAs as accessible links; plus JSON-LD, title & meta. The “beautiful and crawlable” hybrid. Reflow (mobile) is the one deferred piece — a separate phone mock will drive it.

Results

Fidelity gate score (overlay vs mock). The judge is an LLM with ±1–2 noise; treat as a band.

Site	Business	Aesthetic	Score
saas_botanical	B2B SaaS	botanical	9.8	ship
redwood_diner	pizzeria	retro diner	9	ship
redwood_storybook	pizzeria	storybook	9	ship
redwood_tarot	pizzeria	engraved tarot	9	ship
redwood_vaporwave	pizzeria	neon vaporwave	9	ship
saas_deco	B2B SaaS	Art-Deco	9	ship
saas_memphis	B2B SaaS	Memphis	9	ship
redwood_pop	pizzeria	pop-art	5	limit

Engineering & honesty

One source of truth. lib.py holds paths, the Gemini key + call, the twin schema, the role taxonomy, geometry/colour and the photo mask — no duplication, no cross-run imports.

One-command pipeline. run.py <site> runs localize → erase → overlay → semantic → hub. run.py <fixture>:<world> goes from nothing → finished site (build + render via mock-engine, then the pipeline).

Tests pin the fixes. test_lib.py (unit) + test_integration.py (erase→overlay→semantic on a synthetic fixture) run offline via run_tests.sh.

Cost is logged. Every Gemini call's token usage is written to spend.log.

Reviewed. A high-recall code review found 10 issues; all fixed and pinned by tests.

Known limit. Pop-art (heavy outlined display type on busy halftone backgrounds) leaves residual smudges — the one non-shipping aesthetic, documented rather than hidden.

Turning a generated design mock into faithful, crawlable HTML