Skip to content

feat(fleetnode): route firmware updates through fleet nodes#598

Merged
ankitgoswami merged 6 commits into
mainfrom
ankitg/fleetnode-firmware-uploads
Jun 29, 2026
Merged

feat(fleetnode): route firmware updates through fleet nodes#598
ankitgoswami merged 6 commits into
mainfrom
ankitg/fleetnode-firmware-uploads

Conversation

@ankitgoswami

Copy link
Copy Markdown
Contributor

Reviewable diff: +372/-49 across 11 files (excludes generated, test, and story files).

Summary

Operators can now firmware-update miners that are paired behind a fleet node without changing the existing firmware upload flow. Fleetd reuses the uploaded firmware file as the artifact source, issues a fleet-node miner command with a firmware artifact reference, and the node downloads, verifies, and passes a local temp file into the existing plugin firmware update path.

How it works

An operator queues a firmware update the same way they do today, using the existing firmware file ID in the command payload. During command execution, fleetd opens the firmware metadata, attaches the firmware ID, filename, size, SHA-256, and filesystem path to the SDK FirmwareFile, and the remote-node miner adapter converts that into a fleet-node FirmwareUpdateAction plus a download expectation for the target miner.

The fleet-node gateway admits the download only while the command is in flight. For FIRMWARE_PAYLOAD artifacts, it serves bytes directly from firmware storage instead of copying them into command-artifact storage. The fleet node downloads the artifact stream, validates the header, size, and SHA-256, writes the payload to a short-lived temp file, calls the local plugin FirmwareUpdate, then deletes the temp file after the plugin call returns.

flowchart LR
  A["Operator queues firmware update"] --> B["fleetd command worker"]
  B --> C["Open firmware metadata"]
  C --> D["Remote miner adapter"]
  D --> E["Fleet-node ControlStream command"]
  E --> F["Fleet node downloads firmware artifact"]
  F --> G["Validate header, size, and SHA-256"]
  G --> H["Write temp firmware file"]
  H --> I["Plugin FirmwareUpdate"]
  I --> J["Ack command result"]
Loading

Areas of the code involved

Area / package / file What changed Why it matters for review
proto/fleetnodegateway/v1 Added FirmwareUpdateAction carrying a CommandArtifactRef; generated Go/TS updated Wire contract between fleetd and fleet nodes
server/internal/domain/command Firmware execution now opens metadata with ID/SHA/path and passes it through sdk.FirmwareFile Keeps existing operator API while giving remote execution enough artifact identity to dispatch
server/internal/domain/miner/remotenode Remote firmware updates send artifact-aware fleet-node commands with a download expectation Main server-side routing path for fleet-node-paired miners
server/internal/handlers/fleetnode/gateway Download handler serves FIRMWARE_PAYLOAD from firmware storage when an in-flight expectation matches Security and lifecycle boundary for firmware artifact downloads
server/cmd/fleetnode Fleet node handles firmware actions by downloading, validating, temp-file writing, invoking plugin firmware update, and cleaning up Node-side install path and failure handling
server/internal/infrastructure/files Firmware opener returns ID, filename, size, SHA-256, and path; checksum lookup uses an ID index Avoids byte duplication and repeated metadata lookup work
server/sdk/v1 FirmwareFile and SDK plugin proto bridge carry firmware ID and SHA-256 Preserves firmware metadata across out-of-process plugins
Generated files Gateway and SDK generated Go/TS/Python outputs updated Generated — skip except for confirming source proto changes
Tests Added focused coverage for files, gateway firmware downloads, remote miner dispatch, fleet-node handling, command execution, and SDK bridge metadata Regression coverage for the new command-artifact firmware path

Key technical decisions & trade-offs

  • Reuse existing firmware storage as the artifact source instead of duplicating firmware bytes into command-artifact storage.
  • Use the existing command-artifact download admission model so firmware bytes are only available for matching in-flight commands.
  • Keep CommandSender stable and add an optional artifact-aware sender interface to avoid churning existing fakes and command senders.
  • Write a node-local temp file before plugin invocation because the current plugin bridge passes firmware by filesystem path.
  • Keep node-local firmware caching and public UI/API changes out of scope for this branch.
  • Keep the existing artifact download timeout policy; slow-link firmware deadline tuning is a follow-up operational decision.

Testing & validation

  • go test ./internal/infrastructure/files ./internal/domain/miner/remotenode ./sdk/v1 ./cmd/fleetnode -run 'TestOpenFirmwareFile|TestFindFirmwareFileByChecksum|TestDeleteFirmwareFile_RemovesFromChecksumIndex|TestMiner_FirmwareUpdate|TestDriverGRPCServer_UpdateFirmwarePreservesMetadata|TestMinerCommandActionTimeoutUsesFirmwareBudget|TestHandleMinerCommand_FirmwareUpdate'
  • go test ./internal/handlers/fleetnode/gateway -run 'TestCommandArtifact|TestDownloadCommandArtifactServesFirmwarePayload'
  • go test ./internal/domain/command -run TestExecuteCommandOnDevice_FirmwareUpdatePassesFileMetadata
  • golangci-lint run -c .golangci.yaml in server
  • just gen-sdk-protos
  • git diff --check

just lint was also attempted from the repo root. buf lint completed, then the client lint step stopped because this worktree does not have client/node_modules installed (eslint: command not found).


Compound Engineering
GPT-5

@github-actions github-actions Bot added javascript Pull requests that update javascript code client server shared labels Jun 26, 2026
@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown

🔐 Codex Security Review

Note: This is an automated security-focused code review generated by Codex.
It should be used as a supplementary check alongside human review.
False positives are possible - use your judgment.

Scope summary

  • Reviewed pull request diff only (e2e67ffb11982b79159024c0f09d80bf7d74943d...ba10cac6cc6df7bd9d14145b4ffd18ebadadd760, exact PR three-dot diff)
  • Model: gpt-5.5

💡 Click "edited" above to see previous reviews for this PR.


Review Summary

Overall Risk: MEDIUM

Findings

[MEDIUM] Firmware download can consume the entire node-side update budget

  • Category: Reliability
  • Location: server/cmd/fleetnode/minercommand.go:49
  • Description: The new firmwareMinerCommandTimeout is 10m, and the command context created from it is used for both downloading the firmware artifact and calling dev.FirmwareUpdate. The gateway also allows a firmware artifact download to run for 10m, so a large or slow but valid download can consume the full node-side command deadline before the device firmware upload starts.
  • Impact: Legitimate firmware updates can fail or retry after the payload has already been transferred, wasting large amounts of bandwidth. Worse, if the driver or miner begins an update with a nearly expired context, retries can leave the miner in an ambiguous partial-update state.
  • Recommendation: Split the node-side budget into separate download and device-update contexts, or reserve explicit time for dev.FirmwareUpdate before starting the download. Keep the combined budget below the server-side firmware worker timeout, with enough remaining slack for install-status polling and reboot handling.

Notes

I did not find auth bypasses, SQL injection, command injection, pool hijacking, hardcoded payout/wallet substitutions, or protobuf wire-format breakage in the changed hunks. The firmware artifact path includes purpose, size, in-flight command, metadata, and checksum checks, which addresses the main trust-boundary concerns in this diff.


Generated by Codex Security Review |
Triggered by: @ankitgoswami |
Review workflow run

@ankitgoswami ankitgoswami marked this pull request as ready for review June 26, 2026 18:17
@ankitgoswami ankitgoswami requested a review from a team as a code owner June 26, 2026 18:17
Copilot AI review requested due to automatic review settings June 26, 2026 18:17

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR routes firmware updates for fleet-node–paired miners through the fleet node control plane by treating the uploaded firmware file as a command artifact payload (served directly from firmware storage), propagating firmware identity/metadata (ID + SHA-256 + path) through the SDK and command execution path, and adding node-side download/verify/temp-file installation that reuses the existing plugin firmware update flow.

Changes:

  • Added a fleet-node FirmwareUpdateAction that carries a CommandArtifactRef for firmware payload downloads.
  • Extended firmware file handling to expose ID/SHA-256/path metadata and threaded that metadata through SDK/plugin bridges and command execution.
  • Implemented fleet-node firmware artifact download admission + streaming from firmware storage, plus node-side download/validation/temp-file lifecycle and dispatch from remote-node miners with artifact expectations.

Reviewed changes

Copilot reviewed 19 out of 24 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
server/sdk/v1/python/proto_fleet_sdk/generated/pb/driver_pb2.pyi Generated Python SDK typings updated for firmware file metadata fields.
server/sdk/v1/python/proto_fleet_sdk/generated/pb/driver_pb2.py Generated Python SDK protobuf updated for firmware file metadata fields.
server/sdk/v1/plugin.go Populates firmware ID/SHA-256 when bridging UpdateFirmware between gRPC and SDK types.
server/sdk/v1/plugin_test.go Adds coverage to ensure UpdateFirmware preserves firmware metadata through the bridge.
server/sdk/v1/pb/driver.proto Extends FirmwareFileInfo with id and sha256.
server/sdk/v1/interface.go Extends FirmwareFile struct with ID and SHA256.
server/internal/infrastructure/files/service.go Adds an ID→checksum index to avoid repeated checksum work and support metadata lookups.
server/internal/infrastructure/files/firmware.go Adds OpenFirmwareFileWithInfo (returns ID/size/SHA/path), checksum caching, and index maintenance on save/delete/init.
server/internal/infrastructure/files/firmware_test.go Adds tests for OpenFirmwareFileWithInfo, checksum index rebuild, validation, and delete cleanup.
server/internal/handlers/fleetnode/gateway/handler.go Serves firmware payload downloads from firmware storage when purpose is FIRMWARE_PAYLOAD.
server/internal/handlers/fleetnode/gateway/handler_controlstream_test.go Wires file service into the harness for artifact-related tests.
server/internal/handlers/fleetnode/gateway/handler_artifact_test.go Adds test ensuring firmware payload downloads stream from firmware storage under in-flight expectations.
server/internal/domain/miner/remotenode/miner.go Adds artifact-aware dispatch path and implements remote firmware updates via fleet-node commands + download expectations.
server/internal/domain/miner/remotenode/miner_test.go Adds test asserting firmware update dispatch includes artifact ref + expectations.
server/internal/domain/command/execution_service.go Uses OpenFirmwareFileWithInfo and passes ID/SHA/path through sdk.FirmwareFile during execution.
server/internal/domain/command/execution_service_test.go Adds test asserting firmware file metadata is passed into miner firmware update calls.
server/cmd/fleetnode/run.go Prepares a state-dir-scoped firmware temp root at daemon startup for downloads.
server/cmd/fleetnode/minercommand.go Adds firmware command action timeout, download/verify/temp-file install flow, and a download limiter.
server/cmd/fleetnode/minercommand_test.go Adds tests for firmware timeout selection, download/install success, checksum mismatch, oversized artifacts, limiter behavior, and temp dir cleanup.
server/cmd/fleetnode/control.go Passes the gateway client into miner-command handling so firmware downloads can be fetched.
proto/fleetnodegateway/v1/fleetnodegateway.proto Adds FirmwareUpdateAction to the fleet-node gateway MinerCommand action set.
client/src/protoFleet/api/generated/fleetnodegateway/v1/fleetnodegateway_pb.ts Generated TS client updated for the new firmware update action/message.

Comment thread server/cmd/fleetnode/minercommand.go
chatgpt-codex-connector[bot]

This comment was marked as outdated.

chatgpt-codex-connector[bot]

This comment was marked as outdated.

@ankitgoswami ankitgoswami force-pushed the ankitg/fleetnode-firmware-uploads branch from 40a9723 to 6932b8b Compare June 26, 2026 18:46
ankitgoswami added a commit that referenced this pull request Jun 26, 2026
- Reserve firmware install polling time after fleet-node uploads

@mcharles-square mcharles-square left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @ankitgoswami! question no the flow - if one wants to only apply the fw update to a subset of miners connected to a node before running a full node update does that mean that the fw artifact needs to redownload to the node to target the remaining miners?

@ankitgoswami ankitgoswami force-pushed the ankitg/fleetnode-firmware-uploads branch from c2b6b25 to ba10cac Compare June 29, 2026 14:57
@ankitgoswami

Copy link
Copy Markdown
Contributor Author

@mcharles-square
yes, intentionally did not add file caching in this PR. the fleet node writes it to a short-lived temp file, calls the local plugin firmware update, then deletes the temp file after the plugin call returns. the tradeoffs:

  • Simpler and safer lifecycle now: no cache invalidation, no stale firmware on nodes, no disk growth.
  • More bandwidth & disk I/O for staged rollouts or retries across many miners on the same node.

@ankitgoswami ankitgoswami merged commit 77fa6eb into main Jun 29, 2026
52 checks passed
@ankitgoswami ankitgoswami deleted the ankitg/fleetnode-firmware-uploads branch June 29, 2026 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

client javascript Pull requests that update javascript code server shared

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants