Fix bootstrap transfer and live analysis job status by johnzfitch · Pull Request #9 · johnzfitch/pyghidra-lite

johnzfitch · 2026-03-12T05:04:20Z

Summary

wire the documented bootstrap load parameter into the actual import and worker analysis paths
resolve bootstrap sources from loaded programs or on-disk projects and persist bootstrap stats in analysis status
make binaries(jobs=true) merge live .analysis_status fields and recalculate ETA from current progress instead of returning stale queue-time values

Testing

uv run pytest -q
163 passed, 1 skipped

johnzfitch · 2026-03-12T05:07:44Z

@codex full review please, ensure no regressions

Copilot

Pull request overview

This PR updates the MCP server API to a consolidated 8-tool surface area, adds bootstrap-based name transfer during import/analysis, and adjusts tests to reflect the consolidation and new bootstrap behavior.

Changes:

Consolidate ~58 MCP tools down to 8 consolidated tools (load, delete, binaries, info, functions, code, xrefs, search) and update _available_tools + registration tests accordingly.
Add bootstrap support (CLI import --bootstrap, worker forwarding, and server-side bootstrap transfer during load and offline imports).
Add/extend job handling for background scan-style tasks with persisted result.json output, plus tests around live job status merging in binaries(jobs=True).

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 10 comments.

File	Description
`src/pyghidra_lite/server.py`	Implements consolidated tool API, bootstrap transfer, job result persistence helpers, and CLI flag wiring.
`tests/test_server.py`	Updates expectations for consolidated tool list and adds bootstrap transfer/load tests.
`tests/test_async_automation.py`	Updates tool registration tests for the 8-tool set and adds bootstrap flag + binaries live-status test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-12T05:10:52Z

src/pyghidra_lite/server.py

-    returns immediately with a unit_id and eta_sec so the agent can poll
-    analysis_status() without hitting MCP timeouts.
+    For binaries under 10MB, blocks until analysis completes.
+    For binaries 10MB+, runs async - poll list(jobs=True) for progress.


The load() docstring and guidance mention polling list(jobs=True) for large binaries, but there is no list tool registered (the tool is binaries). This will mislead MCP clients and should be updated to reference binaries(jobs=True) (and ideally include how to filter by unit_id).

Suggested change

For binaries 10MB+, runs async - poll list(jobs=True) for progress.

For binaries 10MB+, runs async - poll binaries(jobs=True, unit_id=<returned_unit_id>) for progress.

Copilot · 2026-03-12T05:10:52Z

src/pyghidra_lite/server.py

+                    entry["binary_name"] = p.name
+                    entry["message"] = (
+                        f"Analysis in progress ({file_size_mb:.0f}MB). "
+                        f"Poll with analysis_status(unit_ids=['{unit_id}'])"
+                    )


In the async path, the user-facing message still tells callers to poll analysis_status(unit_ids=[...]), but analysis_status is no longer an MCP tool after consolidation. Update the message to the correct polling mechanism (e.g., binaries(jobs=True)), otherwise clients will fail following the instructions.

Copilot · 2026-03-12T05:10:52Z

src/pyghidra_lite/server.py

@@ -1437,6 +1659,9 @@ async def import_binary(
                    f"Poll with analysis_status(unit_ids=['{unit_id}'])"
                ),


This queued-response message also instructs polling analysis_status(...), but that tool no longer exists. Align this message with the consolidated API (likely binaries(jobs=True)).

Copilot · 2026-03-12T05:10:53Z

src/pyghidra_lite/server.py

@@ -1437,6 +1659,9 @@ async def import_binary(
                    f"Poll with analysis_status(unit_ids=['{unit_id}'])"
                ),
            }
+            if bootstrap_source:
+                result["bootstrap"] = bootstrap_source


bootstrap has inconsistent types across responses: for already-analyzed/ready paths it becomes a stats dict from _apply_bootstrap_transfer(...), but for the queued async path it's set to bootstrap_source (a unit_id string). This makes the tool output schema unstable for clients. Consider returning a consistent structure (e.g., {source_unit_id, stats?}) in all cases, or using a separate field name for the source id.

Suggested change

result["bootstrap"] = bootstrap_source

result["bootstrap_source"] = bootstrap_source

Copilot · 2026-03-12T05:10:53Z

src/pyghidra_lite/server.py

+            if not candidates:
+                raise McpError(ErrorData(
+                    code=INVALID_PARAMS,
+                    message=f"Not found: {name!r}. Use unit_id from list().",


The delete() error message says to use a unit_id from list(), but the consolidated tool is binaries(). This is user-facing guidance and should reference the correct tool name/usage.

Suggested change

message=f"Not found: {name!r}. Use unit_id from list().",

message=f"Not found: {name!r}. Use unit_id from binaries().",

Copilot · 2026-03-12T05:10:53Z

src/pyghidra_lite/server.py

-        binary,
-        lambda handle: GhidraTools(handle).get_callees(function),
-    )
+        return _guarded_tool_call("list", op)


binaries() is wrapped with _guarded_tool_call("list", op), so any INTERNAL_ERROR messages will be prefixed with the wrong action name. Use a consistent action string (e.g., "binaries") so logs/errors match the actual tool.

Suggested change

return _guarded_tool_call("list", op)

return _guarded_tool_call("binaries", op)

Copilot · 2026-03-12T05:10:53Z

src/pyghidra_lite/server.py

+        entry["eta_sec"] = eta_sec
+
+    if job.get("kind") == "scan" and job.get("status") == "complete":
+        entry["hint"] = f"Call get_job_result('{unit_id}') for results"


For scan jobs, binaries() sets a hint to call get_job_result(...), but there is no MCP tool with that name—only an internal helper _get_job_result. As-is, background scan results written to result.json are not retrievable via the public 8-tool API. Either (a) include the scan result in binaries(jobs=True) when complete, or (b) expose a supported way to fetch results (still within the consolidated API), and update this hint accordingly.

Suggested change

entry["hint"] = f"Call get_job_result('{unit_id}') for results"

# For completed scan jobs, include the result directly when available.

try:

result = _get_job_result(unit_id) # type: ignore[name-defined]

except Exception as exc: # pragma: no cover - defensive

logger.debug("Failed to load scan result for %s: %s", unit_id, exc)

else:

if result is not None:

entry["result"] = result

Copilot · 2026-03-12T05:10:53Z

src/pyghidra_lite/server.py

+
+    Used by batch_search_strings(background=True) and extract_bunfs().
+    On completion, job status transitions to "complete" and result is persisted to disk.
+    The model should poll analysis_status(unit_ids=[job_id]) then call get_job_result(job_id).


The _run_scan_task docstring still tells callers to poll analysis_status(...) and then call get_job_result(...), but neither is part of the consolidated toolset. Update this documentation to match the supported polling/result retrieval flow (e.g., via binaries(jobs=True) and/or an explicit result field).

Suggested change

The model should poll analysis_status(unit_ids=[job_id]) then call get_job_result(job_id).

Callers should poll the job listing API (for example, the ``binaries`` tool with

``jobs=True`` or an equivalent jobs view) until this ``job_id`` reaches a terminal

status (``\"complete\"`` or ``\"error\"``). Once complete, the scan output can be

retrieved from the persisted ``result.json`` file and/or any explicit result field

exposed by the corresponding tool response.

Copilot · 2026-03-12T05:10:54Z

src/pyghidra_lite/server.py

+            return {
+                "job_id": job_id,
+                "status": "queued",
+                "hint": f"Poll: list(jobs=True). On complete: get results via list().",
+            }


In the background (bg=True) batch-search response, the hint refers to list(jobs=True) and “get results via list()”, but the consolidated tool is binaries(), and there’s no public API path here to retrieve the persisted result.json payload. Update the hint to the real polling/result retrieval mechanism (and ensure that mechanism exists).

Copilot · 2026-03-12T05:10:54Z

src/pyghidra_lite/server.py

+# =============================================================================
+# LEGACY TOOL ALIASES (hidden from model, route to consolidated tools)
+# =============================================================================

-    Args:
-        binary: Binary name.
-        limit: Max results.
-    """
-    from pyghidra_lite.hermes import HermesTools
+# The old tool names are no longer registered as @mcp.tool() but can still
+# be called via the alias routing in consolidated.py. When list_tools is
+# called, only the 8 consolidated tools are returned.


These comments reference an alias router in consolidated.py / TOOL_ALIASES, but there is no such module/symbol anywhere in the repository. If backward-compat aliasing is intended, the implementation file needs to exist and be wired in; otherwise this comment block should be removed/updated to avoid pointing readers to non-existent code.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 71c29624e8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-12T05:16:31Z

src/pyghidra_lite/server.py

+    unit_id = bootstrap if _UNIT_ID_RE.match(bootstrap) else _find_on_disk(bootstrap)
+    if unit_id:
+        handle = _load_project_into_backend(backend, unit_id)
+        if handle is not None:


Validate bootstrap unit IDs against completed analysis state

This branch treats any 16-hex bootstrap value as a valid source and immediately tries to load it, bypassing the .analysis_status readiness checks used by _find_on_disk. Because _load_project_into_backend marks loaded handles as analyzed, _normalize_bootstrap_source can accept sources that were only imported (or left incomplete) and should have been rejected as "not analyzed yet"; in practice, load(..., bootstrap=<unit_id>) can apply transfers from stale/unanalyzed projects after a restart.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-12T05:16:31Z

src/pyghidra_lite/server.py

+    if job.get("kind") == "scan" and job.get("status") == "complete":
+        entry["hint"] = f"Call get_job_result('{unit_id}') for results"


Expose completed background scan results via binaries()

For scan jobs, this path only returns a hint to call get_job_result(...) and never includes the persisted result.json payload; however, this commit’s consolidated tool surface no longer exposes a get_job_result MCP tool (only 8 tools remain). As a result, search(..., bg=True) jobs can complete but clients still cannot retrieve their results through MCP, especially after the in-memory job entry expires.

Useful? React with 👍 / 👎.

Fix bootstrap transfer and live job status

71c2962

Copilot AI review requested due to automatic review settings March 12, 2026 05:04

Copilot started reviewing on behalf of johnzfitch March 12, 2026 05:04 View session

Copilot AI reviewed Mar 12, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Mar 12, 2026

View reviewed changes

johnzfitch added 2 commits March 11, 2026 22:20

Fix review findings in consolidated API responses

6c65c11

Fix MCP schemas and profile-scoped bootstrap caching

c0f01b2

	For binaries 10MB+, runs async - poll list(jobs=True) for progress.
	For binaries 10MB+, runs async - poll binaries(jobs=True, unit_id=<returned_unit_id>) for progress.

		@@ -1437,6 +1659,9 @@ async def import_binary(
		f"Poll with analysis_status(unit_ids=['{unit_id}'])"
		),

	result["bootstrap"] = bootstrap_source
	result["bootstrap_source"] = bootstrap_source

	message=f"Not found: {name!r}. Use unit_id from list().",
	message=f"Not found: {name!r}. Use unit_id from binaries().",

	return _guarded_tool_call("list", op)
	return _guarded_tool_call("binaries", op)

-        entry["hint"] = f"Call get_job_result('{unit_id}') for results"
+        # For completed scan jobs, include the result directly when available.
+        try:
+            result = _get_job_result(unit_id)  # type: ignore[name-defined]
+        except Exception as exc:  # pragma: no cover - defensive
+            logger.debug("Failed to load scan result for %s: %s", unit_id, exc)
+        else:
+            if result is not None:
+                entry["result"] = result

-    The model should poll analysis_status(unit_ids=[job_id]) then call get_job_result(job_id).
+    Callers should poll the job listing API (for example, the ``binaries`` tool with
+    ``jobs=True`` or an equivalent jobs view) until this ``job_id`` reaches a terminal
+    status (``\"complete\"`` or ``\"error\"``). Once complete, the scan output can be
+    retrieved from the persisted ``result.json`` file and/or any explicit result field
+    exposed by the corresponding tool response.

		if job.get("kind") == "scan" and job.get("status") == "complete":
		entry["hint"] = f"Call get_job_result('{unit_id}') for results"

Conversation

johnzfitch commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Uh oh!

johnzfitch commented Mar 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

johnzfitch commented Mar 12, 2026 •

edited

Loading