Why this exists

RAG regressions often appear after prompt edits, model swaps, index rebuilds, or chunking changes. Target: measure those changes before merge

Next checks

  • golden-set retrieval recall and faithfulness suites
  • per-run latency and cost budget
  • score records keyed by commit and dataset hash
  • GitHub Actions gate that compares a pull request to a baseline branch

Public line

Public after a repo, reproducible benchmark, and one evaluation run exist