internal fixture · v5_t31 · 2026-05-22

Industrial manual QA fails when citations are plausible but not specific

Checks answer paths before they look acceptable in aggregate metrics

authorityright source family
citationright span, not adjacent text
safetyunsafe servicing escalates
01 · runningManual QA over maintenance and safety documents

The first tests looked acceptable because the retrieved passages were near the right topic

02 · failureAdjacent citation looked correct but missed the required authority

Different from generic answer similarity; can pass a loose score and still be unusable

03 · fixEvaluate source family, span hit, and escalation path

The fixture checks whether the answer was grounded in the right document, not just a similar paragraph

91Q/A items
28holdout items
68tests
5/5hybrid safety exact
Evaluation contract
  • Gatepass · warn · clarify · escalate
  • Authoritymethodology · guidance · regulation
  • Citationgold span hit · first gold rank
  • Safetyunsafe operational advice escalates
Source families
  • NASA reliability-centered maintenance
  • OSHA / eCFR hazardous energy control
  • NIOSH hazardous energy guidance
  • DOE / FEMP operations guidance
Result panel

Hybrid RRF, v5_t31 internal report

not a public benchmark
recall@50.978
citation hit0.945
gate accuracy1.000
authority accuracy1.000
recall@5

Whether the expected source appears in the first five retrieved chunks

citation hit

Whether the answer cites the required span, not just a nearby passage

gate accuracy

Whether the system chooses pass, warn, clarify, or escalate correctly

authority accuracy

Whether the answer relies on the right source family for the question

Value proof

What retrieval-only review would miss

value_proof_v5_t31
5support gaps

Full fixture items that still missed retrieval or citation readiness after the hybrid run

3false-comfort items

Expected source was in top-5, but the used citation still missed the gold span

2authority-confusion misses

NASA RCM methodology questions that retrieved nearby material but missed the intended authority span

QA089dense-only residual

Dense retrieval reached adjacent support; hybrid restored exact safety support on the safety cluster

SME review packet

Support gaps are exported as a review queue

review_packet_v5_t31
5review items

Support gaps converted into question, missed-check, gold-span, and used-span rows

3false-comfort queue

Top-5 retrieval hit, but exact support still failed

2P0 items

High-severity pump hydraulic-field citation misses

CSVreview surface

A compact file a document owner can inspect without reading the full report JSON

Failure case

QA089

citation specificity
Beforehybrid rank 6

Exact eCFR stored-energy definition sat just outside top-5

Fixselector + exposure

Same-authority citation specificity, then small lexical exposure residual

Afterhybrid rank 5

Exact span cited; safety cluster moved to 5/5 exact under hybrid

Holdout
items28
recall@51.000
citation hit0.893
gate / safety / authority1.000
Boundary

Internal fixture evidence only; not a public benchmark, safety advice, or production compliance claim; source documents are linked and cited, not redistributed