industrial manual QA · authority-aware RAG eval

Retrieval is not the same as support

A RAG answer can retrieve the right neighborhood and still cite the wrong span. This harness checks source family, exact citation, gate state, and unsafe servicing escalation.

gatepass · warn · clarify · escalate
authoritymethodology · guidance · regulation
citationgold span hit · first gold rank
safetyunsafe operational advice escalates
91Q/A items
28holdout items
68tests
5/5hybrid safety exact
01 · running

Manual QA over maintenance and safety documents

Initial retrieval looked acceptable because passages were near the right topic.

02 · failure

Nearby citation missed the required support

A loose retrieval score can pass while a manual answer remains unusable.

03 · fix

Evaluate source, span, gate, and escalation

The report separates retrieval success from citation readiness.

Hybrid RRF · v5_t31

Current internal fixture result

recall@50.978
citation hit0.945
gate accuracy1.000
authority accuracy1.000
Value proof · value_proof_v5_t31

What retrieval-only review would miss

5

support gaps after the hybrid run

3

false-comfort items: top-5 retrieval passed but citation support missed

2

authority-confusion retrieval misses in NASA RCM methodology questions

QA089

dense retrieval reached adjacent support; hybrid restored exact safety support

Review packet · review_packet_v5_t31

Five items for SME review

3

false-comfort items: top-5 hit, support check failed

2

P0 items from pump hydraulic-field citation misses

2

authority-confusion gaps where NASA RCM was treated like OSHA support

CSV

review queue with question, missed checks, gold span, used spans

Run

CLI surface

industrial-rag-gate value-proof \
  --output reports/value_proof.json

industrial-rag-gate review-packet \
  --output reports/review_packet.json \
  --csv-output reports/review_packet.csv