AI/Data Engineering

Production-grade RAG Evaluation System

Retrieval evaluation pipeline for recall, faithfulness, latency, and cost

system design · eval harness · CI gates next

Why this exists

RAG regressions often appear after prompt edits, model swaps, index rebuilds, or chunking changes. Target: measure those changes before merge

Public after a repo, reproducible benchmark, and one evaluation run exist