Proposed work: add MCP schema/runtime conformance tests

Bounty #722

This is a proposed-work intake item for the live #722 proposed-work bounty. It does not make implementation work claimable unless maintainers later create and reserve a separate implementation bounty.

## Problem

MergeWork's public MCP endpoint now exposes useful `tools/list` schemas, but recent live checks show the advertised schemas and `tools/call` runtime validation can drift. When that happens, MCP clients and agents cannot safely treat `tools/list` as the contract: schema-valid calls can fail at runtime, while schema-invalid calls can succeed.

This is not just a single-field bug. The current pattern has already appeared in multiple `submit_work_proof` paths:

- undeclared arguments were accepted despite `additionalProperties: false`;
- some declared selector combinations were schema-valid but rejected by runtime;
- the advertised `format` enum was exact `["text", "json"]`, while runtime accepted aliases such as `JSON` and ` JSON `.

Fixing each mismatch one by one helps, but it does not give maintainers a guardrail that future MCP tool schema edits still match runtime behavior.

## Current Evidence

Public/live evidence from #656:

- `https://github.com/ramimbo/mergework/issues/656#issuecomment-4600135684` reports `submit_work_proof` ignoring undeclared arguments even though the schema disallows additional properties.
- `https://github.com/ramimbo/mergework/issues/656#issuecomment-4600251202` reports schema-valid selector shapes that runtime rejects.
- `https://github.com/ramimbo/mergework/issues/656#issuecomment-4601544504` reports `format` enum aliases accepted outside the advertised schema.
- Focused fix PR for the latest enum mismatch: `https://github.com/ramimbo/mergework/pull/793`.

The current code has normal endpoint tests for specific cases in `tests/test_api_mcp.py`, but no shared schema/runtime conformance helper that takes a tool's advertised `inputSchema` and checks representative accepted/rejected values against `tools/call`.

## Proposed Work

Add a focused MCP schema/runtime conformance test layer. A useful first version could:

- fetch or construct the same `tools/list` entries returned by `/mcp`;
- define a small conformance matrix for each MCP tool with representative valid calls and invalid boundary calls;
- assert that schema-invalid examples are rejected by `tools/call` rather than silently normalized or ignored;
- assert that schema-valid examples used by clients still pass at runtime;
- cover exact enum behavior, explicit `null`, undeclared properties, selector exclusivity, numeric canonicalization, and boolean/string type boundaries where those properties are advertised;
- make it easy to add a row when a new MCP tool or schema property is introduced.

The smallest useful implementation could stay test-only at first. If maintainers prefer runtime enforcement, a later implementation could add JSON-schema validation before tool dispatch, but this proposal does not require that broader runtime change.

## Expected Value

This gives maintainers a reusable guardrail for the MCP contract instead of relying on ad hoc bug reports. It helps agents trust `tools/list`, reduces repeated schema drift bugs, and makes future MCP changes easier to review because the tests show which runtime behaviors are intentionally part of the public contract.

It also reduces maintainer review time on PRs that add MCP tools or adjust schemas: reviewers can ask for conformance rows instead of manually checking every schema/runtime edge.

## Reference Tier

100-500 MRWK: useful issue, test, docs page, small bugfix.

## Possible Acceptance Criteria

- Tests compare representative `tools/list` schema expectations against `tools/call` behavior for all public MCP tools with declared input schemas.
- `submit_work_proof` coverage includes at least:
  - exact `format` enum behavior;
  - explicit `null` rejection when the schema says `type: string`;
  - undeclared-property rejection when `additionalProperties: false`;
  - valid and invalid selector combinations;
  - canonical numeric argument handling.
- The conformance helper is reusable for future MCP tools without duplicating long request boilerplate.
- Existing clean examples still pass, including omitted optional arguments that rely on defaults.
- The test names and fixtures make it clear whether a failure means the schema is too permissive, too strict, or runtime validation is drifting.
- No private/admin-only MCP behavior, secrets, payout execution, wallet mutation, bridge, exchange, off-ramp, price, liquidity, or speculative payment behavior is added.

## Evidence or Tests Required

- Focused MCP tests such as `python -m pytest tests/test_api_mcp.py -q`.
- Full `pytest` if shared MCP helpers or dispatch behavior changes.
- `python -m mypy app` if runtime helpers are touched.
- `ruff check`, `ruff format --check`, and `git diff --check` for touched files.
- If runtime JSON-schema validation is added later, regression tests should show schema-invalid calls return the existing bounded MCP invalid-arguments error shape.

## Duplicate Search

Checked related open MCP issues and PRs before opening this proposal:

- #710 / PR #738 add input schemas for MCP tools. They improve the schema surface, but do not propose a reusable schema/runtime conformance test layer.
- #711 / PR #732 add issue-number selectors, not a cross-tool schema conformance guard.
- #713 / PR #731 add structured MCP JSON content, not validation conformance.
- #714 / PR #734 add safer field-level MCP validation errors, not a test harness that proves schemas and runtime stay aligned.
- #776 is an implementation bounty for MCP ergonomics and typed responses; the prepared work there is broader implementation work, while this proposal is a narrower maintainer-facing test/contract guard.
- #656 bug reports cover individual live mismatches; this proposal targets the recurring test gap that allowed those mismatches.

Searches for "MCP schema runtime conformance", "tools/list tools/call conformance", "MCP inputSchema test harness", and "schema runtime validation MCP" did not find an existing proposed-work issue for this specific guardrail.

## Out of Scope

- No broad MCP redesign.
- No requirement to replace the current Python validators with a JSON-schema library unless maintainers choose that direction later.
- No wallet transfer, payout execution, treasury mutation, custody, bridge, exchange, off-ramp, liquidity, price, private secret, or private security-detail behavior.
- No claim that this proposed-work issue is itself an implementation bounty.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposed work: add MCP schema/runtime conformance tests #794

Problem

Current Evidence

Proposed Work

Expected Value

Reference Tier

Possible Acceptance Criteria

Evidence or Tests Required

Duplicate Search

Out of Scope

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Proposed work: add MCP schema/runtime conformance tests #794

Description

Problem

Current Evidence

Proposed Work

Expected Value

Reference Tier

Possible Acceptance Criteria

Evidence or Tests Required

Duplicate Search

Out of Scope

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions