01 · Work

Systems first, artifacts second

Large problem class, clear failure mode, reproducible result

Flagship systems

Operational AI/data problems

01 / 07

Industrial RAG Gate

AI evaluation · industrial safety · fixture design · retrieval eval · authority gate · 2026

Evaluates whether industrial manual answers cite the right authority and escalate unsafe servicing questions

PythonRAG evaluationsentence-transformersHybrid RRFunittest
● 91-item fixture ● 28-item holdout ● review packet ● 68 tests ○ second document family ○ SME review loop ○ case record
Review queue
5 items · 2 P0
02 / 07

P2P Replay Gate

Agent workflow evaluation · procurement controls · state replay · action gate · policy oracle · 2026

Pre-checks purchase-to-pay agent actions against replayed workflow state before payment

PythonWorkflow replaySQLitePolicy oracleAgent evaluationGitHub Actions
● agent action gate ● persistent store replay ● ops-readiness report ● BPIC2019 smoke ● 66 tests ○ larger BPIC2019 run ○ trace visualizer ○ external AP workflow feedback
Recovery
recovered
03 / 07

Production-grade RAG Evaluation System

AI/Data Engineering · system design · eval harness · CI gates · next

Gate prompt, model, index, and chunking changes on retrieval quality, latency, and cost

PythonFastAPIPostgreSQLpgvectorDockerGitHub Actions
● design note ○ benchmark ○ demo ○ runbook
Target gate
recall · faith · cost
04 / 07

AI-native Research Workspace

Full-stack AI · React client · Python backend · retrieval inspection · next

Compare retrieval changes, prompt diffs, eval output, and model responses in one local review surface

TypeScriptReactFastAPIWebSocketSQLite
● product direction ○ prototype ○ screen capture ○ usage notes
Target user
self-use first
05 / 07

Low-level Inference Runtime Benchmark Lab

Modeling · Systems · profiling · quantization · runtime comparison · next

Track latency, memory, and quality deltas across runtime and quantization choices

PyTorchTritonCUDA profilingC++ bindingsNsight
● measurement plan ○ tokens/sec table ○ variance report ○ methodology
Primary metric
latency / quality trade-off
06 / 07

Replenishment Policy Gate

Data systems · replenishment policy · forecasting · base-stock policy · stockout gate · 2026

Gates model-informed reorder policies against service-floor, lead-time, cost, and SKU-level stockout risks

Pythonpandasscikit-learnsimulationGitHub Actions
● UCI dataset ● failure-mode report ● frontier gate ● lead-time uncertainty ● SKU diagnostics ○ second dataset check ○ operator feedback
Gate
review
07 / 07

MLOps / Data Quality / Deployment Layer

MLOps · data contracts · drift checks · deploy surface · next

Track dataset identity, score provenance, drift, and deployment health

Great ExpectationsGitHub ActionsDockerPrometheusOpenTelemetry
● operational target ○ quality gates ○ alerts ○ rollback runbook
Focus
ops readiness
Supporting artifacts

Reusable tools and focused models

01 / 03

tool-tax

Agent systems · measurement · CLI · MCP proxy · PyPI release · 2026

Measures hidden tool-schema surface across MCP servers, OpenAPI files, and agent tool catalogs

PythonMCPCLIPyPICI
● repo ● package ● public benchmark ● config risk lint ○ real host traces ○ external MCP feedback
Risk lint
5 findings
02 / 03

site-voice-packs

AI web context · agent input files · SITE.md · VOICE.md · webfit gate · 2026

Separates website structure from copy rhythm so reference style does not leak source-site subject matter

PythonCLIWeb analysisAgent context
● repo ● package ● visible comparison ● webfit gate ○ second site corpus ○ external builder feedback
webfit delta
+28.9
03 / 03

Modulation-aware Key Estimator

Applied audio ML · model inference · CLI · FastAPI · release asset · 2026

Estimates region-wise musical key instead of forcing one global label on the whole track

PythonPyTorchFastAPIAudio ML
● repo ● checkpoint release ● SHA-256 loader ○ training provenance
Model surface
inference ready