Skip to content

fix(scanner): strip hardcoded deepwiki URL from Cisco scanner raw stdout#528

Merged
Dumbris merged 3 commits into
smart-mcp-proxy:mainfrom
Ylsssq926:fix/scanner-cisco-stdout-sanitize
May 26, 2026
Merged

fix(scanner): strip hardcoded deepwiki URL from Cisco scanner raw stdout#528
Dumbris merged 3 commits into
smart-mcp-proxy:mainfrom
Ylsssq926:fix/scanner-cisco-stdout-sanitize

Conversation

@Ylsssq926

Copy link
Copy Markdown
Contributor

Summary

The upstream `cisco-ai-mcp-scanner` PyPI package hardcodes `"server_url": "https://mcp.deepwiki.com/mcp\"\` in its raw stdout output. When mcpproxy renders this in the UI it looks like tool definitions are being exfiltrated to a third party, even though `NetworkReq=false` and no request is actually made. This PR sanitizes the display-only `ScannerJobStatus.Stdout` at write time, replacing the offending line with an explanatory annotation while leaving `reportData` (used for findings parsing) untouched.

Implements Option A from #383 as spec'd by @algis-dumbris. Option C (upstream fix at `cisco-ai-defense/mcp-scanner`) is orthogonal and not covered here.

Fixes #383

What changes

  • `setScannerLogs` runs `sanitizeCiscoStdout` only when `scannerID == ciscoScannerID`; the rest of scanners' logs are untouched.
  • The sanitizer regex is narrow: it only matches `server_url` lines pointing at `deepwiki`, and uses `[ \t]` instead of `\s` so the next line's indentation is preserved.
  • The replacement is a one-line annotation; raw stdout that follows is preserved for debugging.
  • The display stdout is sanitized on the way into `ScannerJobStatus.Stdout`. The `reportData` consumed by `parseCiscoScannerOutput` is independent and untouched.
  • `sanitizeCiscoStdout` runs outside the engine mutex (pure function, no shared state).

Test plan

  • `go build -o /dev/null ./cmd/mcpproxy`
  • `go vet ./internal/security/scanner/...`
  • `go test -v ./internal/security/scanner/...` — all green incl. 6 new tests
  • `gofmt -l internal/security/scanner/` — empty
  • `prek run --all-files`
  • Manual: have not run a real Cisco scan locally; happy to add a screenshot if useful

New tests:

  • `TestSanitizeCiscoStdout_RemovesHardcodedDeepwikiURL` / `_PreservesScanResults` / `_NoOpWithoutDeepwiki` / `_HandlesCRLF` — sanitizer in isolation.
  • `TestSetScannerLogs_CiscoStdoutSanitized` / `_NonCiscoScannerStdoutPreserved` — scanner-id dispatch is exercised, including a sanity check that a non-cisco scanner with `deepwiki` literally in its stdout is left alone.

Notes

  • The replacement uses a `//`-style annotation, so the sanitized stdout is no longer valid JSON. Findings parsing is unaffected (`parseCiscoScannerOutput` reads `reportData`, not `Stdout`), but if any downstream consumer pipes `scanner_statuses[].stdout` into a JSON parser this would be a behavior change. Not aware of any such consumer; flagging in case I'm missing one.
  • The regex assumes pretty-printed multi-line output. If upstream ever minifies to single-line JSON, the sanitizer would silently no-op; documented in the helper godoc and tracked as a follow-up if it ever happens.
  • Did not extract the scanner-id literal in `registry_bundled.go` to share a const with `engine.go`. Left a `// keep in sync` note on the new const; happy to do the cross-file refactor in a separate PR if you want.

Ylsssq926 added 2 commits May 26, 2026 11:29
The upstream cisco-ai-mcp-scanner PyPI package emits a hardcoded
"server_url: https://mcp.deepwiki.com/mcp" header in its raw-format
stdout regardless of the actual scan target. mcpproxy was capturing
this verbatim into ScannerJobStatus.Stdout and surfacing it in the
scan report UI, making it look like tool definitions were being
exfiltrated to an unrelated third-party URL. No network request is
actually made (NetworkReq=false, static analyzers only).

The findings parser (parseCiscoScannerOutput) already ignores this
header, so findings stay unchanged. This commit strips the bogus
line from the stdout shown to users and replaces it with a short
annotation referencing the issue.

Closes smart-mcp-proxy#383
…n tests

Address review feedback on smart-mcp-proxy#383:

- Change regex from \s* to [ \t]* around the URL line so it no longer
  eats the following line's indentation on multi-line JSON.
- Extract scanner-id literal into ciscoScannerID const (keep in sync
  with registry_bundled.go).
- Move sanitize call before the engine mutex; the function is pure and
  does not need the lock.
- Add TestSetScannerLogs_CiscoStdoutSanitized and
  TestSetScannerLogs_NonCiscoScannerStdoutPreserved to cover the
  scannerID dispatch path directly.
- Drop TestParseCiscoScannerOutput_UnaffectedBySanitization; the new
  integration tests cover the same intent more precisely.
- Document in godoc that minified single-line JSON bypasses this filter.
@codecov-commenter

Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

…gistry

registry_bundled.go now references ciscoScannerID instead of duplicating the
"cisco-mcp-scanner" literal. Removes the silent-no-op footgun where an ID drift
would disable stdout sanitization without failing any test (both sides
previously used the literal independently).
@Dumbris Dumbris merged commit a667213 into smart-mcp-proxy:main May 26, 2026
Dumbris added a commit that referenced this pull request Jun 14, 2026
)

The bundled cisco-mcp-scanner runs 'static --tools tools.json': it analyzes
the exported tool definitions with YARA + readiness rules and never probes the
live server endpoint or makes a network request. An is_safe/SAFE result therefore
reflects the tool definitions, not the server's runtime behavior — so a clean
Cisco result for a remote/URL server must not be over-trusted as live coverage.

Previously the only caveat ('no network request was made') was emitted by
sanitizeCiscoStdout solely when the upstream output happened to contain the
hardcoded deepwiki placeholder line. If a future cisco-ai-mcp-scanner release
changes or drops that placeholder, the caveat would silently vanish while the
static-only limitation remains.

Prepend a permanent coverage caveat to every Cisco execution log, independent
of the deepwiki string, leading the output so it survives MaxLogBytes truncation.
Still strip the deepwiki placeholder line when present. Update docs to frame the
limitation as a coverage caveat rather than a purely cosmetic note.

Resolves the residual coverage-honesty concern from gh #383 (cosmetic leak
already fixed in #528).

Related #383
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cisco scanner output shows hardcoded mcp.deepwiki.com URL in execution logs

3 participants