Tool-heavy agent workflow test; context was spent before task reasoning began; tool-tax turns invisible tool-schema surface into a number that can be reviewed in CI

Record

Problem

Tool catalogs enter agent context as hidden fixed cost

Bottleneck

MCP, OpenAPI, and custom manifests expose schema surface in different formats

Fix

Normalize schema cost, expose reports, support PR diffs, benchmark public catalogs, lint MCP config risk

Result

10 public catalogs: 3,429 tools; risk lint sample: 2 servers, 5 findings, high risk

Guardrail

Local estimates only; not provider billing

Built

  • CLI for MCP configs, OpenAPI files, and agent tool catalogs
  • Schema budget reports, PR diffs, and CI thresholds
  • Progressive-loading index generation
  • Lazy-schema MCP stdio proxy experiment
  • Public benchmark manifest over MCP and OpenAPI catalogs
  • No-probe MCP config risk lint for env, shell, package runner, and filesystem scope
  • PyPI distribution path

Signals

  • public benchmark: 10 catalogs, 3,429 tools, 1,442,056 estimated full-tax tokens
  • slim index: 169,423 estimated tokens, 88.3% schema-surface reduction
  • filesystem MCP proxy: 2,102 to 260 estimated tokens
  • risk lint sample: 2 servers, 5 findings, high risk; CI can fail on risk level
  • Stripe OpenAPI: heaviest public catalog in the current benchmark
full-tax tokens

Estimated token cost when complete tool schemas are loaded up front

slim-index tokens

Estimated cost when only a compact tool index is loaded first

config risk

No-probe lint for literal secrets, shell eval, unpinned runners, and broad filesystem scope

guardrail

Local estimate only; not provider billing