Reference-site context test for agent-written web pages; useful structure came with source-site nouns; site-voice-packs separates page structure and copy rhythm, then scores visible HTML output

Record

Problem

Reference sites gave useful structure but leaked source-site product nouns

Bottleneck

Keep structure and rhythm without copying spans or importing the source business

Fix

Split SITE and VOICE files, add source-term boundaries, score visible HTML output

Result

Stripe/LedgerFlow web output: 63.2 to 92.1 webfit; mimic risk 0.0

Guardrail

Copy safety and claim safety remain gates

Side-by-side webfit comparison for the same LedgerFlow prompt
Same web page prompt; left without context, right with SITE.md and VOICE.md

Built

  • reusable SITE.md and VOICE.md context files
  • concise CLI/package shape
  • source-site contamination guardrails
  • visible before/after examples designed for direct agent inspection
  • site2voice webfit gate for HTML output comparison

Signals

  • without context webfit: 63.2
  • with SITE.md + VOICE.md webfit: 92.1
  • reference-fit delta: +28.9
  • mimic risk: 0.0
  • copy safety: 100.0
webfit

Visible HTML score from structure fit, voice fit, copy safety, and claim safety

structure fit

Section order, page jobs, density, navigation, and conversion path

mimic risk

Copy-overlap risk; high webfit should not reward source copying

boundary

Pattern only; product nouns and facts must come from the new brief