Why this exists

Runtime choices are often argued with vague speed claims. Method first, then numbers: p50/p95 latency, memory footprint, quality deltas

Next checks

  • reproducible hardware and model configuration
  • tokens/sec and memory tables
  • quantization comparison with quality notes
  • variance report across repeated runs

Public line

No performance claim until the measurement script, environment, and raw result table are public