Catch AP agent failures before payment
A small replay gate for invoice agents: duplicate invoice, vendor mismatch, missing receipt, missing approval, active holds, and consignment leakage.
External Log Path
Streaming CSV/XES import keeps the public repo usable without redistributing proprietary or benchmark event logs. Policy templates block missing amount data by default.
BPIC2019 Pack
Bundled activity mapping for the public purchase-order handling log. The source dataset is not redistributed; the pack maps accounting evidence into replay events and infers 2-way, 3-way, invoice-before-GR, and consignment policies.
Real smoke run: 1,000 BPIC2019 cases, 15,351 source rows, 10,627 replay events, 0 row errors, 0.970 audit trace coverage.
Use the adapter report before making claims. Activities outside the replay oracle scope should remain visible as unmapped or be handled by a custom activity map. BPIC2019 does not expose the full approval workflow, so the pack keeps approval checks disabled unless custom evidence is supplied.
Agent Action Gate
Before an AP agent clears payment, the proposed action is converted into a replay event and checked against the same policy oracle.
The sample report records baseline state, proposed event, after-action violations, new codes, resolved codes, and an exit code for CI or agent tool wrappers.
Operational Readiness Check
The ops report checks replay hygiene over the supplied log. It does not claim production SLA coverage.
| Check | Fixture result | Why it matters |
|---|---|---|
| idempotency | 0 duplicate event_id rows | safe retry and replay behavior starts with stable event identity |
| ordering | 0 input inversions | late or out-of-order events remain visible before policy replay |
| schema | 51 / 51 events versioned | event contract can evolve without silent parser drift |
| parallel replay | serial output matches 4-worker output | case-level partitioning stays deterministic under concurrent evaluation |
| digest | SHA-256 replay fingerprint | same event and policy payload can be compared across runs |
| resume | last case/timestamp/event cursor | batch replay can expose a restart point |
Persistent Replay Store
The store replay path adds a local SQLite event store, idempotent append, bounded ingest flushes, partition checkpoints, and simulated interruption recovery.
| Check | Fixture result | Model |
|---|---|---|
| append pressure | 255 attempted appends | 5 repeated passes over 51 unique events |
| idempotency | 204 duplicate retries | event_id primary key, at-least-once append |
| backpressure | peak queue depth 16 | bounded local flush threshold |
| partition replay | 4 partitions | stable sha256(case_id) modulo partition count |
| recovery | 2 checkpoints before interrupt, 4 after recovery | partition cursor after successful replay |
Replay Shape
The gate tests workflow state, not only final labels.
case policy, vendor, amount, quantity, approval threshold
goods receipt, invoice, hold, release, approval, payment
seeded defect inserted into a clean trace or replay replacement
stateful rule evaluation with event-level evidence
JSON report with summary, failure queue, and metric glossary
Scenario Pack
| Mutation | Count | Policy signal |
|---|---|---|
| duplicate_invoice | 12 | critical |
| vendor_mismatch | 12 | critical |
| amount_overbill | 6 | value mismatch |
| quantity_mismatch | 3 | quantity mismatch |
| payment_before_gr | 3 | critical |
| payment_before_approval | 3 | critical |
| payment_while_blocked | 3 | critical |
| consignment_invoice | 3 | PO-level invoice leak |
| consignment_duplicate_invoice | 3 | critical |
Metric Notes
| Metric | Meaning |
|---|---|
| critical caught | seeded critical rows where every expected code was emitted |
| duplicate recall | duplicate invoice scenarios caught by duplicate rule |
| false holds | clean traces that produced an unnecessary hold finding |
| audit coverage | runs containing the events needed to audit the policy decision |
Fixture evidence only. No real vendor, invoice, payment, or company data is included.