html-builder-v7 · R&D

Turning a generated design mock into faithful, crawlable HTML

We take a baked-text website mock image (from Steve's mock-engine) plus its YAML “twin” — the content contract listing every section → element → text — and rebuild it as real HTML. Two outputs: a pixel-faithful overlay and a semantic / SEO version. Same pipeline drives 8 wildly different aesthetics, with zero per-site code.

8 aesthetics · 2 businesses 7/8 ship-grade (fidelity ≥ 8) 100% text localization twin-anchored no per-site code

The core idea

An image model can dream up beautiful, arbitrary design that CSS templates can't. But an image is just pixels — no text, no structure, no SEO. v7 keeps the beauty and puts the meaning back: it locates every known string in the picture, erases the baked text to a clean “plate,” and re-renders the words as real HTML on top. Because we already know every string from the twin, we never guess what's text — we only find where it is.

DATA

Twin + Mock

YAML twin (every string, by role) + the rendered mock image.

STEP 1

Localize

Find each known string's pixel box — EasyOCR, auto-escalating to Gemini vision for script fonts.

STEP 2

Erase

Photo-aware inpaint → a “clean plate” with text gone but imagery intact.

STEP 3a

Overlay

Real HTML text positioned over the plate — pixel-faithful to the mock.

STEP 3b

Semantic

Same visuals as crawlable HTML: h1/section/nav, lazy slices, img alt, JSON-LD.

Why it works — the load-bearing decisions

Each of these was a real problem we hit and solved; together they're the logic.

Two outputs, one pipeline

✏️ Overlay — pixel-faithful

Real HTML text absolutely positioned over the clean-plate picture. Reproduces any aesthetic the mock shows, exactly. Best when visual fidelity is the goal. Trade-off: it's a picture with text on top — weak for SEO and it scales rather than reflows.

🔎 Semantic — SEO / accessible

Same visuals, emitted as <header>/<main>/<section>, one <h1> then <h2>/<h3>, <nav>; per-section background slices that lazy-load (better LCP); real photos as <img alt>; CTAs as accessible links; plus JSON-LD, title & meta. The “beautiful and crawlable” hybrid. Reflow (mobile) is the one deferred piece — a separate phone mock will drive it.

Results

Fidelity gate score (overlay vs mock). The judge is an LLM with ±1–2 noise; treat as a band.

SiteBusinessAestheticScore
saas_botanicalB2B SaaSbotanical9.8ship
redwood_dinerpizzeriaretro diner9ship
redwood_storybookpizzeriastorybook9ship
redwood_tarotpizzeriaengraved tarot9ship
redwood_vaporwavepizzerianeon vaporwave9ship
saas_decoB2B SaaSArt-Deco9ship
saas_memphisB2B SaaSMemphis9ship
redwood_poppizzeriapop-art5limit

The sites

Each links its overview hub (all pipeline stages), the pixel-faithful overlay, and the semantic SEO build.

Engineering & honesty

One source of truth. lib.py holds paths, the Gemini key + call, the twin schema, the role taxonomy, geometry/colour and the photo mask — no duplication, no cross-run imports.

One-command pipeline. run.py <site> runs localize → erase → overlay → semantic → hub. run.py <fixture>:<world> goes from nothing → finished site (build + render via mock-engine, then the pipeline).

Tests pin the fixes. test_lib.py (unit) + test_integration.py (erase→overlay→semantic on a synthetic fixture) run offline via run_tests.sh.

Cost is logged. Every Gemini call's token usage is written to spend.log.

Reviewed. A high-recall code review found 10 issues; all fixed and pinned by tests.

Known limit. Pop-art (heavy outlined display type on busy halftone backgrounds) leaves residual smudges — the one non-shipping aesthetic, documented rather than hidden.