Skip to content

Fix bootstrap transfer and live analysis job status#9

Open
johnzfitch wants to merge 3 commits intomasterfrom
fix/bootstrap-live-status
Open

Fix bootstrap transfer and live analysis job status#9
johnzfitch wants to merge 3 commits intomasterfrom
fix/bootstrap-live-status

Conversation

@johnzfitch
Copy link
Owner

@johnzfitch johnzfitch commented Mar 12, 2026

Summary

  • wire the documented bootstrap load parameter into the actual import and worker analysis paths
  • resolve bootstrap sources from loaded programs or on-disk projects and persist bootstrap stats in analysis status
  • make binaries(jobs=true) merge live .analysis_status fields and recalculate ETA from current progress instead of returning stale queue-time values

Testing

  • uv run pytest -q
  • 163 passed, 1 skipped

Copilot AI review requested due to automatic review settings March 12, 2026 05:04
@johnzfitch
Copy link
Owner Author

@codex full review please, ensure no regressions

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the MCP server API to a consolidated 8-tool surface area, adds bootstrap-based name transfer during import/analysis, and adjusts tests to reflect the consolidation and new bootstrap behavior.

Changes:

  • Consolidate ~58 MCP tools down to 8 consolidated tools (load, delete, binaries, info, functions, code, xrefs, search) and update _available_tools + registration tests accordingly.
  • Add bootstrap support (CLI import --bootstrap, worker forwarding, and server-side bootstrap transfer during load and offline imports).
  • Add/extend job handling for background scan-style tasks with persisted result.json output, plus tests around live job status merging in binaries(jobs=True).

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 10 comments.

File Description
src/pyghidra_lite/server.py Implements consolidated tool API, bootstrap transfer, job result persistence helpers, and CLI flag wiring.
tests/test_server.py Updates expectations for consolidated tool list and adds bootstrap transfer/load tests.
tests/test_async_automation.py Updates tool registration tests for the 8-tool set and adds bootstrap flag + binaries live-status test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

returns immediately with a unit_id and eta_sec so the agent can poll
analysis_status() without hitting MCP timeouts.
For binaries under 10MB, blocks until analysis completes.
For binaries 10MB+, runs async - poll list(jobs=True) for progress.
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The load() docstring and guidance mention polling list(jobs=True) for large binaries, but there is no list tool registered (the tool is binaries). This will mislead MCP clients and should be updated to reference binaries(jobs=True) (and ideally include how to filter by unit_id).

Suggested change
For binaries 10MB+, runs async - poll list(jobs=True) for progress.
For binaries 10MB+, runs async - poll binaries(jobs=True, unit_id=<returned_unit_id>) for progress.

Copilot uses AI. Check for mistakes.
Comment on lines +1629 to +1633
entry["binary_name"] = p.name
entry["message"] = (
f"Analysis in progress ({file_size_mb:.0f}MB). "
f"Poll with analysis_status(unit_ids=['{unit_id}'])"
)
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the async path, the user-facing message still tells callers to poll analysis_status(unit_ids=[...]), but analysis_status is no longer an MCP tool after consolidation. Update the message to the correct polling mechanism (e.g., binaries(jobs=True)), otherwise clients will fail following the instructions.

Copilot uses AI. Check for mistakes.
Comment on lines 1656 to 1660
@@ -1437,6 +1659,9 @@ async def import_binary(
f"Poll with analysis_status(unit_ids=['{unit_id}'])"
),
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This queued-response message also instructs polling analysis_status(...), but that tool no longer exists. Align this message with the consolidated API (likely binaries(jobs=True)).

Copilot uses AI. Check for mistakes.
@@ -1437,6 +1659,9 @@ async def import_binary(
f"Poll with analysis_status(unit_ids=['{unit_id}'])"
),
}
if bootstrap_source:
result["bootstrap"] = bootstrap_source
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bootstrap has inconsistent types across responses: for already-analyzed/ready paths it becomes a stats dict from _apply_bootstrap_transfer(...), but for the queued async path it's set to bootstrap_source (a unit_id string). This makes the tool output schema unstable for clients. Consider returning a consistent structure (e.g., {source_unit_id, stats?}) in all cases, or using a separate field name for the source id.

Suggested change
result["bootstrap"] = bootstrap_source
result["bootstrap_source"] = bootstrap_source

Copilot uses AI. Check for mistakes.
if not candidates:
raise McpError(ErrorData(
code=INVALID_PARAMS,
message=f"Not found: {name!r}. Use unit_id from list().",
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The delete() error message says to use a unit_id from list(), but the consolidated tool is binaries(). This is user-facing guidance and should reference the correct tool name/usage.

Suggested change
message=f"Not found: {name!r}. Use unit_id from list().",
message=f"Not found: {name!r}. Use unit_id from binaries().",

Copilot uses AI. Check for mistakes.
binary,
lambda handle: GhidraTools(handle).get_callees(function),
)
return _guarded_tool_call("list", op)
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

binaries() is wrapped with _guarded_tool_call("list", op), so any INTERNAL_ERROR messages will be prefixed with the wrong action name. Use a consistent action string (e.g., "binaries") so logs/errors match the actual tool.

Suggested change
return _guarded_tool_call("list", op)
return _guarded_tool_call("binaries", op)

Copilot uses AI. Check for mistakes.
entry["eta_sec"] = eta_sec

if job.get("kind") == "scan" and job.get("status") == "complete":
entry["hint"] = f"Call get_job_result('{unit_id}') for results"
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For scan jobs, binaries() sets a hint to call get_job_result(...), but there is no MCP tool with that name—only an internal helper _get_job_result. As-is, background scan results written to result.json are not retrievable via the public 8-tool API. Either (a) include the scan result in binaries(jobs=True) when complete, or (b) expose a supported way to fetch results (still within the consolidated API), and update this hint accordingly.

Suggested change
entry["hint"] = f"Call get_job_result('{unit_id}') for results"
# For completed scan jobs, include the result directly when available.
try:
result = _get_job_result(unit_id) # type: ignore[name-defined]
except Exception as exc: # pragma: no cover - defensive
logger.debug("Failed to load scan result for %s: %s", unit_id, exc)
else:
if result is not None:
entry["result"] = result

Copilot uses AI. Check for mistakes.

Used by batch_search_strings(background=True) and extract_bunfs().
On completion, job status transitions to "complete" and result is persisted to disk.
The model should poll analysis_status(unit_ids=[job_id]) then call get_job_result(job_id).
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _run_scan_task docstring still tells callers to poll analysis_status(...) and then call get_job_result(...), but neither is part of the consolidated toolset. Update this documentation to match the supported polling/result retrieval flow (e.g., via binaries(jobs=True) and/or an explicit result field).

Suggested change
The model should poll analysis_status(unit_ids=[job_id]) then call get_job_result(job_id).
Callers should poll the job listing API (for example, the ``binaries`` tool with
``jobs=True`` or an equivalent jobs view) until this ``job_id`` reaches a terminal
status (``\"complete\"`` or ``\"error\"``). Once complete, the scan output can be
retrieved from the persisted ``result.json`` file and/or any explicit result field
exposed by the corresponding tool response.

Copilot uses AI. Check for mistakes.
Comment on lines +2416 to +2420
return {
"job_id": job_id,
"status": "queued",
"hint": f"Poll: list(jobs=True). On complete: get results via list().",
}
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the background (bg=True) batch-search response, the hint refers to list(jobs=True) and “get results via list()”, but the consolidated tool is binaries(), and there’s no public API path here to retrieve the persisted result.json payload. Update the hint to the real polling/result retrieval mechanism (and ensure that mechanism exists).

Copilot uses AI. Check for mistakes.
Comment on lines +2499 to +2505
# =============================================================================
# LEGACY TOOL ALIASES (hidden from model, route to consolidated tools)
# =============================================================================

Args:
binary: Binary name.
limit: Max results.
"""
from pyghidra_lite.hermes import HermesTools
# The old tool names are no longer registered as @mcp.tool() but can still
# be called via the alias routing in consolidated.py. When list_tools is
# called, only the 8 consolidated tools are returned.
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments reference an alias router in consolidated.py / TOOL_ALIASES, but there is no such module/symbol anywhere in the repository. If backward-compat aliasing is intended, the implementation file needs to exist and be wired in; otherwise this comment block should be removed/updated to avoid pointing readers to non-existent code.

Copilot uses AI. Check for mistakes.
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 71c29624e8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +596 to +599
unit_id = bootstrap if _UNIT_ID_RE.match(bootstrap) else _find_on_disk(bootstrap)
if unit_id:
handle = _load_project_into_backend(backend, unit_id)
if handle is not None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Validate bootstrap unit IDs against completed analysis state

This branch treats any 16-hex bootstrap value as a valid source and immediately tries to load it, bypassing the .analysis_status readiness checks used by _find_on_disk. Because _load_project_into_backend marks loaded handles as analyzed, _normalize_bootstrap_source can accept sources that were only imported (or left incomplete) and should have been rejected as "not analyzed yet"; in practice, load(..., bootstrap=<unit_id>) can apply transfers from stale/unanalyzed projects after a restart.

Useful? React with 👍 / 👎.

Comment on lines +1037 to +1038
if job.get("kind") == "scan" and job.get("status") == "complete":
entry["hint"] = f"Call get_job_result('{unit_id}') for results"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Expose completed background scan results via binaries()

For scan jobs, this path only returns a hint to call get_job_result(...) and never includes the persisted result.json payload; however, this commit’s consolidated tool surface no longer exposes a get_job_result MCP tool (only 8 tools remain). As a result, search(..., bg=True) jobs can complete but clients still cannot retrieve their results through MCP, especially after the in-memory job entry expires.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants